WWW '23: Proceedings of the ACM Web Conference 2023

Full Citation in the ACM Digital Library

SESSION: Keynotes

Using diversity as a source of scientific innovation for the Web

The Web has become a resource that allows us to make sense of social phenomena around the world. This started the moment users became content creators, and has grown with the emergence of social platforms tailored to our need to connect and share with others. Throughout my work, I’ve come to appreciate how social media has democratized access to real-world news and social sentiment, while also witnessing the loss of trust created by fake information. As a computer scientist from Chile in Latin America, I have worked on a range of problems that were driven by local needs. Many times, I have tried to apply state of the art solutions to well-known problems, only to find that these don’t work outside of their initial evaluation dataset. In this talk, I’ll discuss how geographical, language, and social diversity have opened new avenues for innovation and better understanding the social Web. I’ll also show that to truly create useful technological solutions, we must develop inclusive research and resources.

Decolonizing Creative Labor in the age of AI

Creative AI has got us asking existential questions of what makes us human. To crack the code, you need to crack the culture that makes us who we are. Who and what is creative remains largely disconnected from diverse and global cultural norms, rendering existing technology suboptimal and even unusable to the world’s majority. Creativity has long been dictated by the aesthetic taste, values, needs, concerns, and aspirations of the West. Today, India and China alone account for the majority of the world’s users. The Global South are fast shaping data systems in ways that remain underexamined and siloed as “Rest of World” among industry and government folks. With the rise of the creator economy across sectors, questions abound on creative rights, provenance, fairness, labor, and representation. This talk discusses concerns around digital labor, data materiality, media literacies, creative value, and online expression. In doing so, it sets a pathway towards designing inclusive and intersectional systems that transcend borders.

Concept Regulation in the Social Sciences

The sciences, notably biology and medicine, operate with highly regulated taxonomies and ontologies. The Social Sciences, on the other hand, muddle through in a proverbial tower of Babel. There may be some real benefits to an undisciplined set of ideas, but also some real costs. Over the last ten years, political scientists have attempted to get their semantic act by cooperating to formalize their vocabulary. The result has been a dramatic improvement in how scholars diagnose and treat problems of democracy, as well as a set of web applications that have changed the way countries write constitutions. Nevertheless, these methods of semantic cooperation have exposed some persistent challenges of “social engineering,” ones that may have tractable web solutions.

GNNs and Graph Generative models for biomedical applications

Graph generative models are recently gaining significant interest in current application domains. They are commonly used to model social networks, knowledge graphs, and protein-protein interaction networks. In this talk we will present the potential of graph generative models and our recent relevant efforts in the biomedical domain. More specifically we present a novel architecture that generates medical records as graphs with privacy guarantees. We capitalize and modify the graph Variational autoencoders (VAEs) architecture. We train the generative model with the well known MIMIC medical database and achieve generated data that are very similar to the real ones yet provide privacy guarantees. We also develop new GNNs for predicting antibiotic resistance and other protein related downstream tasks such as enzymes classifications and Gene Ontology classification. We achieve there as well promising results with potential for future application in broader biomedical related tasks. Finally we present future research directions for multi modal generative models involving graphs.

CONNECTIVITY

The most important new fact about the human condition is that we are now suddenly connected. When I say “suddenly” I refer to the Internet’s birthday, October 29, 1969 and how two thirds of the human race, five billion people, are already on the Internet, in only 50 years. Suddenly. The Arpanet started the Internet in 1969 by networking time-shared minicomputers serving dumb character terminals. Then in 1973 Xerox Palo Alto Research Center (PARC) decided to put a personal computer on every desk. Ethernet was invented on May 22, 1973 to provide local connectivity among those PCs, one on every desk, if you can imagine that. The PARC Ethernet was formed by combining Jerrold coaxial vampire taps, Manchester on-off keying, and Alohanet randomized retransmissions. Then we wrapped it in internet protocols according to a layered reference model. Then we standardized it all: Ethernet, IP, TCP, TELNET, FTP, Mail, URL, HTML, HTTP. Ethernet evolved rapidly away from its Jerrold-Manchester-Alohanet prototype. Ethernet’s legacy is instead packets to the desktop, abundance of bandwidth, and standardization. Come hear all about it. GPT is writing my lecture now.

SESSION: Social Network Analysis and Graph Algorithms

GELTOR: A Graph Embedding Method based on Listwise Learning to Rank

Similarity-based embedding methods have introduced a new perspective on graph embedding by conforming the similarity distribution of latent vectors in the embedding space to that of nodes in the graph; they show significant effectiveness over conventional embedding methods in various machine learning tasks. In this paper, we first point out the three drawbacks of existing similarity-based embedding methods: inaccurate similarity computation, conflicting optimization goal, and impairing in/out-degree distributions. Then, motivated by these drawbacks, we propose AdaSim*, a novel similarity measure for graphs that is conducive to the similarity-based graph embedding. We finally propose GELTOR, an effective embedding method that employs AdaSim* as a node similarity measure and the concept of learning-to-rank in the embedding process. Contrary to existing methods, GELTOR does not learn the similarity scores distribution; instead, for any target node, GELTOR conforms the ranks of its top-t similar nodes in the embedding space to their original ranks based on AdaSim* scores. We conduct extensive experiments with six real-world datasets to evaluate the effectiveness of GELTOR in graph reconstruction, link prediction, and node classification tasks. Our experimental results show that (1) AdaSim* outperforms AdaSim, RWR, and MCT in computing nodes similarity in graphs, (2) our GETLOR outperforms existing state-of-the-arts and conventional embedding methods in most cases of the above machine learning tasks, thereby implying that learning-to-rank is beneficial to graph embedding.

Graph-less Collaborative Filtering

Graph neural networks (GNNs) have shown the power in representation learning over graph-structured user-item interaction data for collaborative filtering (CF) task. However, with their inherently recursive message propagation among neighboring nodes, existing GNN-based CF models may generate indistinguishable and inaccurate user (item) representations due to the over-smoothing and noise effect with low-pass Laplacian smoothing operators. In addition, the recursive information propagation with the stacked aggregators in the entire graph structures may result in poor scalability in practical applications. Motivated by these limitations, we propose a simple and effective collaborative filtering model (SimRec) that marries the power of knowledge distillation and contrastive learning. In SimRec, adaptive transferring knowledge is enabled between the teacher GNN model and a lightweight student network, to not only preserve the global collaborative signals, but also address the over-smoothing issue with representation recalibration. Empirical results on public datasets show that SimRec archives better efficiency while maintaining superior recommendation performance compared with various strong baselines. Our implementations are publicly available at: https://github.com/HKUDS/SimRec.

Fair Graph Representation Learning via Diverse Mixture-of-Experts

Graph Neural Networks (GNNs) have demonstrated a great representation learning capability on graph data and have been utilized in various downstream applications. However, real-world data in web-based applications (e.g., recommendation and advertising) always contains bias, preventing GNNs from learning fair representations. Although many works were proposed to address the fairness issue, they suffer from the significant problem of insufficient learnable knowledge with limited attributes after debiasing. To address this problem, we develop Graph-Fairness Mixture of Experts (G-Fame), a novel plug-and-play method to assist any GNNs to learn distinguishable representations with unbiased attributes. Furthermore, based on G-Fame, we propose G-Fame++, which introduces three novel strategies to improve the representation fairness from node representations, model layer, and parameter redundancy perspectives. In particular, we first present the embedding diversified method to learn distinguishable node representations. Second, we design the layer diversified strategy to maximize the output difference of distinct model layers. Third, we introduce the expert diversified method to minimize expert parameter similarities to learn diverse and complementary representations. Extensive experiments demonstrate the superiority of G-Fame and G-Fame++ in both accuracy and fairness, compared to state-of-the-art methods across multiple graph datasets.

Multi-Aspect Heterogeneous Graph Augmentation

Data augmentation has been widely studied as it can be used to improve the generalizability of graph representation learning models. However, existing works focus only on the data augmentation on homogeneous graphs. Data augmentation for heterogeneous graphs remains under-explored. Considering that heterogeneous graphs contain different types of nodes and links, ignoring the type information and directly applying the data augmentation methods of homogeneous graphs to heterogeneous graphs will lead to suboptimal results. In this paper, we propose a novel Multi-Aspect Heterogeneous Graph Augmentation framework named MAHGA. Specifically, MAHGA consists of two core augmentation strategies: structure-level augmentation and metapath-level augmentation. Structure-level augmentation pays attention to network schema aspect and designs a relation-aware conditional variational auto-encoder that can generate synthetic features of neighbors to augment the nodes and the node types with scarce links. Metapath-level augmentation concentrates on metapath aspect, which constructs metapath reachable graphs for different metapaths and estimates the graphons of them. By sampling and mixing up based on the graphons, MAHGA yields intra-metapath and inter-metapath augmentation. Finally, we conduct extensive experiments on multiple benchmarks to validate the effectiveness of MAHGA. Experimental results demonstrate that our method improves the performances across a set of heterogeneous graph learning models and datasets.

Testing Cluster Properties of Signed Graphs

This work initiates the study of property testing in signed graphs, where every edge has either a positive or a negative sign. We show that there exist sublinear query and time algorithms for testing three key properties of signed graphs: balance (or 2-clusterability), clusterability and signed triangle freeness. We consider both the dense graph model, where one queries the adjacency matrix entries of a signed graph, and the bounded-degree model, where one queries for the neighbors of a node and the sign of the connecting edge. Our algorithms use a variety of tools from unsigned graph property testing, as well as reductions from one setting to the other. Our main technical contribution is a sublinear algorithm for testing clusterability in the bounded-degree model. This contrasts with the property of k-clusterability in unsigned graphs, which is not testable with a sublinear number of queries in the bounded-degree model. We experimentally evaluate the complexity and usefulness of several of our testers on real-life and synthetic datasets.

RSGNN: A Model-agnostic Approach for Enhancing the Robustness of Signed Graph Neural Networks

Signed graphs model complex relations using both positive and negative edges. Signed graph neural networks (SGNN) are powerful tools to analyze signed graphs. We address the vulnerability of SGNN to potential edge noise in the input graph. Our goal is to strengthen existing SGNN allowing them to withstand edge noises by extracting robust representations for signed graphs. First, we analyze the expressiveness of SGNN using an extended Weisfeiler-Lehman (WL) graph isomorphism test and identify the limitations to SGNN over triangles that are unbalanced. Then, we design some structure-based regularizers to be used in conjunction with an SGNN that highlight intrinsic properties of a signed graph. The tools and insights above allow us to propose a novel framework, Robust Signed Graph Neural Network (RSGNN), which adopts a dual architecture that simultaneously denoises the graph while learning node representations. We validate the performance of our model empirically on four real-world signed graph datasets, i.e., Bitcoin_OTC, Bitcoin_Alpha, Epinion and Slashdot, RSGNN can clearly improve the robustness of popular SGNN models. When the signed graphs are affected by random noise, our method outperforms baselines by up to 9.35% Binary-F1 for link sign prediction. Our implementation is available in PyTorch1.

NeuKron: Constant-Size Lossy Compression of Sparse Reorderable Matrices and Tensors

Many real-world data are naturally represented as a sparse reorderable matrix, whose rows and columns can be arbitrarily ordered (e.g., the adjacency matrix of a bipartite graph). Storing a sparse matrix in conventional ways requires an amount of space linear in the number of non-zeros, and lossy compression of sparse matrices (e.g., Truncated SVD) typically requires an amount of space linear in the number of rows and columns. In this work, we propose NeuKron for compressing a sparse reorderable matrix into a constant-size space. NeuKron generalizes Kronecker products using a recurrent neural network with a constant number of parameters. NeuKron updates the parameters so that a given matrix is approximated by the product and reorders the rows and columns of the matrix to facilitate the approximation. The updates take time linear in the number of non-zeros in the input matrix, and the approximation of each entry can be retrieved in logarithmic time. We also extend NeuKron to compress sparse reorderable tensors (e.g. multi-layer graphs), which generalize matrices. Through experiments on ten real-world datasets, we show that NeuKron is (a) Compact: requiring up to five orders of magnitude less space than its best competitor with similar approximation errors, (b) Accurate: giving up to 10 × smaller approximation error than its best competitors with similar size outputs, and (c) Scalable: successfully compressing a matrix with over 230 million non-zero entries.

Multi-aspect Diffusion Network Inference

To learn influence relationships between nodes in a diffusion network, most existing approaches resort to precise timestamps of historical node infections. The target network is customarily assumed as an one-aspect diffusion network, with homogeneous influence relationships. Nonetheless, tracing node infection timestamps is often infeasible due to high cost, and the type of influence relationships may be heterogeneous because of the diversity of propagation media. In this work, we study how to infer a multi-aspect diffusion network with heterogeneous influence relationships, using only node infection statuses that are more readily accessible in practice. Equipped with a probabilistic generative model, we iteratively conduct a posteriori, quantitative analysis on historical diffusion results of the network, and infer the structure and strengths of homogeneous influence relationships in each aspect. Extensive experiments on both synthetic and real-world networks are conducted, and the results verify the effectiveness and efficiency of our approach.

Collaboration-Aware Graph Convolutional Network for Recommender Systems

Graph Neural Networks (GNNs) have been successfully adopted in recommender systems by virtue of the message-passing that implicitly captures collaborative effect. Nevertheless, most of the existing message-passing mechanisms for recommendation are directly inherited from GNNs without scrutinizing whether the captured collaborative effect would benefit the prediction of user preferences. In this paper, we first analyze how message-passing captures the collaborative effect and propose a recommendation-oriented topological metric, Common Interacted Ratio (CIR), which measures the level of interaction between a specific neighbor of a node with the rest of its neighbors. After demonstrating the benefits of leveraging collaborations from neighbors with higher CIR, we propose a recommendation-tailored GNN, Collaboration-Aware Graph Convolutional Network (CAGCN), that goes beyond 1-Weisfeiler-Lehman(1-WL) test in distinguishing non-bipartite-subgraph-isomorphic graphs. Experiments on six benchmark datasets show that the best CAGCN variant outperforms the most representative GNN-based recommendation model, LightGCN, by nearly 10% in Recall@20 and also achieves around 80% speedup. Our code/supplementary is at https://github.com/YuWVandy/CAGCN.

Pairwise-interactions-based Bayesian Inference of Network Structure from Information Cascades

An explicit network structure plays an important role when analyzing and understanding diffusion processes. In many scenarios, however, the interactions between nodes in an underlying network are unavailable. Although many methods for inferring a network structure from observed cascades have been proposed, they did not perceive the relationship between pairwise interactions in a cascade. Therefore, this paper proposes a Pairwise-interactions-based Bayesian Inference method (named PBI) to infer the underlying diffusion network structure. More specifically, to get more accurate inference results, we measure the weights of each candidate pairwise interaction in different cascades and add them to the likelihood of a contagion process. In addition, a pre-pruning work is introduced for candidate edges to further improve the inference efficiency. Experiments on synthetic and real-world networks show that PBI achieves significantly better results.

Encoding Node Diffusion Competence and Role Significance for Network Dismantling

Percolation theory shows that removing a small fraction of critical nodes can lead to the disintegration of a large network into many disconnected tiny subnetworks. The network dismantling task focuses on how to efficiently select the least such critical nodes. Most existing approaches focus on measuring nodes’ importance from either functional or topological viewpoint. Different from theirs, we argue that nodes’ importance can be measured from both of the two complementary aspects: The functional importance can be based on the nodes’ competence in relaying network information; While the topological importance can be measured from nodes’ regional structural patterns. In this paper, we propose an unsupervised learning framework for network dismantling, called DCRS, which encodes and fuses both node diffusion competence and role significance. Specifically, we propose a graph diffusion neural network which emulates information diffusion for competence encoding; We divide nodes with similar egonet structural patterns into a few roles, and construct a role graph on which to encode node role significance. The DCRS converts and fuses the two encodings to output a final ranking score for selecting critical nodes. Experiments on both real-world networks and synthetic networks demonstrate that our scheme significantly outperforms the state-of-the-art competitors for its mostly requiring much fewer nodes to dismantle a network.

Hierarchical Knowledge Graph Learning Enabled Socioeconomic Indicator Prediction in Location-Based Social Network

Socioeconomic indicators reflect location status from various aspects such as demographics, economy, crime and land usage, which play an important role in the understanding of location-based social networks (LBSNs). Especially, several existing works leverage multi-source data for socioeconomic indicator prediction in LBSNs, which however fail to capture semantic information as well as distil comprehensive knowledge therein. On the other hand, knowledge graph (KG), which distils semantic knowledge from multi-source data, has been popular in recent LBSN research, which inspires us to introduce KG for socioeconomic indicator prediction in LBSNs. Specifically, we first construct a location-based KG (LBKG) to integrate various kinds of knowledge from heterogeneous LBSN data, including locations and other related elements like point of interests (POIs), business areas as well as various relationships between them, such as spatial proximity and functional similarity. Then we propose a hierarchical KG learning model to capture both global knowledge from LBKG and domain knowledge from several sub-KGs. Extensive experiments on three datasets demonstrate our model’s superiority over state-of-the-art methods in socioeconomic indicators prediction. Our code is released at: https://github.com/tsinghua-fib-lab/KG-socioeconomic-indicator-prediction.

Opinion Maximization in Social Networks via Leader Selection

We study a leader selection problem for the DeGroot model of opinion dynamics in a social network with n nodes and m edges, in the presence of s0 = O(1) leaders with opinion 0. Concretely, we consider the problem of maximizing the average opinion in equilibrium by selecting k = O(1) leaders with opinion 1 from the remaining n − s0 nodes, which was previously proved to be NP-hard. A deterministic greedy algorithm was also proposed to approximately solve the problem, which has an approximation factor (1 − 1/e) and time complexity O(n3), and thus does not apply to large networks.

In this paper, we first give an interpretation for the opinion of each node in equilibrium and the disagreement of the model from the perspective of resistor networks. We then develop a fast randomized greedy algorithm to solve the problem. To this end, we express the average opinion in terms of the pseudoinverse and Schur complement of Laplacian matrix for . The key ingredients of our randomized algorithm are Laplacian solvers and node sparsifiers, where the latter can preserve pairwise effective resistance by viewing Schur complement as random walks with average length l. For any error parameter ϵ > 0, at each iteration, the randomized algorithm selects a node that deviates from the local optimum marginal gain at most ϵ. The time complexity of the fast algorithm is O(mkllog nϵ− 2). Extensive experiments on various real networks show that the effectiveness of our randomized algorithm is similar to that of the deterministic algorithm, both of which are better than several baseline algorithms, and that our randomized algorithm is more efficient and scalable to large graphs with more than one million nodes.

SeeGera: Self-supervised Semi-implicit Graph Variational Auto-encoders with Masking

Generative graph self-supervised learning (SSL) aims to learn node representations by reconstructing the input graph data. However, most existing methods focus on unsupervised learning tasks only and very few work has shown its superiority over the state-of-the-art graph contrastive learning (GCL) models, especially on the classification task. While a very recent model has been proposed to bridge the gap, its performance on unsupervised learning tasks is still unknown. In this paper, to comprehensively enhance the performance of generative graph SSL against other GCL models on both unsupervised and supervised learning tasks, we propose the SeeGera model, which is based on the family of self-supervised variational graph auto-encoder (VGAE). Specifically, SeeGera adopts the semi-implicit variational inference framework, a hierarchical variational framework, and mainly focuses on feature reconstruction and structure/feature masking. On the one hand, SeeGera co-embeds both nodes and features in the encoder and reconstructs both links and features in the decoder. Since feature embeddings contain rich semantic information on features, they can be combined with node embeddings to provide fine-grained knowledge for feature reconstruction. On the other hand, SeeGera adds an additional layer for structure/feature masking to the hierarchical variational framework, which boosts the model generalizability. We conduct extensive experiments comparing SeeGera with 9 other state-of-the-art competitors. Our results show that SeeGera can compare favorably against other state-of-the-art GCL methods in a variety of unsupervised and supervised learning tasks.

Graph Self-supervised Learning with Augmentation-aware Contrastive Learning

Graph self-supervised learning aims to mine useful information from unlabeled graph data, and has been successfully applied to pre-train graph representations. Many existing approaches use contrastive learning to learn powerful embeddings by learning contrastively from two augmented graph views. However, none of these graph contrastive methods fully exploits the diversity of different augmentations, and hence is prone to overfitting and limited generalization ability of learned representations. In this paper, we propose a novel Graph Self-supervised Learning method with Augmentation-aware Contrastive Learning. Our method is based on the finding that the pre-trained model after adding augmentation diversity can achieve better generalization ability. To make full use of the information from the diverse augmentation method, this paper constructs new augmentation-aware prediction task which complementary with the contrastive learning task. Similar to how pre-training requires fast adaptation to different downstream tasks, we simulate train-test adaptation on the constructed tasks for further enhancing the learning ability; this strategy can be deemed as a form of meta-learning. Experimental results show that our method outperforms previous methods and learns better representations for a variety of downstream tasks.

Enhancing Hierarchy-Aware Graph Networks with Deep Dual Clustering for Session-based Recommendation

Session-based Recommendation aims at predicting the next interacted item based on short anonymous behavior sessions. However, existing solutions neglect to model two inherent properties of sequential representing distributions, i.e., hierarchy structures resulted from item popularity and collaborations existing in both intra- and inter-session. Tackling with these two factors at the same time is challenging. On the one hand, traditional Euclidean space utilized in previous studies fails to capture hierarchy structures due to a restricted representation ability. On the other hand, the intuitive apply of hyperbolic geometry could extract hierarchical patterns but more emphasis on degree distribution weakens intra- and inter-session collaborations. To address the challenges, we propose a Hierarchy-Aware Dual Clustering Graph Network (HADCG) model for session-based recommendation. Towards the first challenge, we design the hierarchy-aware graph modeling module which converts sessions into hyperbolic session graphs, adopting hyperbolic geometry in propagation and attention mechanism so as to integrate chronological and hierarchical information. As for the second challenge, we introduce the deep dual clustering module which develops a two-level clustering strategy, i.e., information regularizer for intra-session clustering and contrastive learner for inter-session clustering, to enhance hyperbolic representation learning from collaborative perspectives and further promote recommendation performance. Extensive experiments on three real-world datasets demonstrate the effectiveness of the proposed HADCG.

Unifying and Improving Graph Convolutional Neural Networks with Wavelet Denoising Filters

Graph convolutional neural network (GCN) is a powerful deep learning framework for network data. However, variants of graph neural architectures can lead to drastically different performance on different tasks. Model comparison calls for a unifying framework with interpretability and principled experimental procedures. Based on the theories from graph signal processing (GSP), we show that GCN’s capability is fundamentally limited by the uncertainty principle, and wavelets provide a controllable trade-off between local and global information. We adapt wavelet denoising filters to the graph domain, unifying popular variants of GCN under a common interpretable mathematical framework. Furthermore, we propose WaveThresh and WaveShrink which are novel GCN models based on proven denoising filters from the signal processing literature. Empirically, we evaluate our models and other popular GCNs under a more principled procedure and analyze how trade-offs between local and global graph signals can lead to better performance in different datasets.

HyConvE: A Novel Embedding Model for Knowledge Hypergraph Link Prediction with Convolutional Neural Networks

Knowledge hypergraph embedding, which projects entities and n-ary relations into a low-dimensional continuous vector space to predict missing links, remains a challenging area to be explored despite the ubiquity of n-ary relational facts in the real world. Currently, knowledge hypergraph link prediction methods are essentially simple extensions of those used in knowledge graphs, where n-ary relational facts are decomposed into different subelements. Convolutional neural networks have been shown to have remarkable information extraction capabilities in previous work on knowledge graph link prediction. In this paper, we propose a novel embedding-based knowledge hypergraph link prediction model named HyConvE, which exploits the powerful learning ability of convolutional neural networks for effective link prediction. Specifically, we employ 3D convolution to capture the deep interactions of entities and relations to efficiently extract explicit and implicit knowledge in each n-ary relational fact without compromising its translation property. In addition, appropriate relation and position-aware filters are utilized sequentially to perform two-dimensional convolution operations to capture the intrinsic patterns and position information in each n-ary relation, respectively. Extensive experimental results on real datasets of knowledge hypergraphs and knowledge graphs demonstrate the superior performance of HyConvE compared with state-of-the-art baselines.

Efficient Approximation Algorithms for the Diameter-Bounded Max-Coverage Group Steiner Tree Problem

The Diameter-bounded max-Coverage Group Steiner Tree (DCGST) problem has recently been proposed as an expressive way of formulating keyword-based search and exploration of knowledge graphs. It aims at finding a diameter-bounded tree which covers the most given groups of vertices and has the minimum weight. In contrast to its specialization—the classic Group Steiner Tree (GST) problem which has been extensively studied, the emerging DCGST problem still lacks an efficient algorithm. In this paper, we propose Cba, the first approximation algorithm for the DCGST problem, and we prove its worst-case approximation ratio. Furthermore, we incorporate a best-first search strategy with two pruning methods into PrunedCBA, an improved approximation algorithm. Our extensive experiments on real and synthetic graphs demonstrate the effectiveness and efficiency of PrunedCBA.

Neighborhood Structure Configuration Models

We develop a new method to efficiently sample synthetic networks that preserve the d-hop neighborhood structure of a given network for any given d. The proposed algorithm trades off the diversity in network samples against the depth of the neighborhood structure that is preserved. Our key innovation is to employ a colored Configuration Model with colors derived from iterations of the so-called Color Refinement algorithm. We prove that with increasing iterations the preserved structural information increases: the generated synthetic networks and the original network become more and more similar, and are eventually indistinguishable in terms of centrality measures such as PageRank, HITS, Katz centrality and eigenvector centrality. Our work enables to efficiently generate samples with a precisely controlled similarity to the original network, especially for large networks.

CurvDrop: A Ricci Curvature Based Approach to Prevent Graph Neural Networks from Over-Smoothing and Over-Squashing

Graph neural networks (GNNs) are powerful models to handle graph data and can achieve state-of-the-art in many critical tasks including node classification and link prediction. However, existing graph neural networks still face both challenges of over-smoothing and over-squashing based on previous literature. To this end, we propose a new Curvature-based topology-aware Dropout sampling technique named CurvDrop, in which we integrate the Discrete Ricci Curvature into graph neural networks to enable more expressive graph models. Also, this work can improve graph neural networks by quantifying connections in graphs and using structural information such as community structures in graphs. As a result, our method can tackle the both challenges of over-smoothing and over-squashing with theoretical justification. Also, numerous experiments on public datasets show the effectiveness and robustness of our proposed method. The code and data are released in https://github.com/liu-yang-maker/Curvature-based-Dropout.

Disentangling Degree-related Biases and Interest for Out-of-Distribution Generalized Directed Network Embedding

The goal of directed network embedding is to represent the nodes in a given directed network as embeddings that preserve the asymmetric relationships between nodes. While a number of directed network embedding methods have been proposed, we empirically show that the existing methods lack out-of-distribution generalization abilities against degree-related distributional shifts. To mitigate this problem, we propose ODIN (Out-of-Distribution Generalized Directed Network Embedding), a new directed NE method where we model multiple factors in the formation of directed edges. Then, for each node, ODIN learns multiple embeddings, each of which preserves its corresponding factor, by disentangling interest factors and biases related to in- and out-degrees of nodes. Our experiments on four real-world directed networks demonstrate that disentangling multiple factors enables ODIN to yield out-of-distribution generalized embeddings that are consistently effective under various degrees of shifts in degree distributions. Specifically, ODIN universally outperforms 9 state-of-the-art competitors in 2 LP tasks on 4 real-world datasets under both identical distribution (ID) and non-ID settings. The code is available at https://github.com/hsyoo32/odin.

ConsRec: Learning Consensus Behind Interactions for Group Recommendation

Since group activities have become very common in daily life, there is an urgent demand for generating recommendations for a group of users, referred to as group recommendation task. Existing group recommendation methods usually infer groups’ preferences via aggregating diverse members’ interests. Actually, groups’ ultimate choice involves compromises between members, and finally, an agreement can be reached. However, existing individual information aggregation lacks a holistic group-level consideration, failing to capture the consensus information. Besides, their specific aggregation strategies either suffer from high computational costs or become too coarse-grained to make precise predictions.

To solve the aforementioned limitations, in this paper, we focus on exploring consensus behind group behavior data. To comprehensively capture the group consensus, we innovatively design three distinct views which provide mutually complementary information to enable multi-view learning, including member-level aggregation, item-level tastes, and group-level inherent preferences. To integrate and balance the multi-view information, an adaptive fusion component is further proposed. As to member-level aggregation, different from existing linear or attentive strategies, we design a novel hypergraph neural network that allows for efficient hypergraph convolutional operations to generate expressive member-level aggregation. We evaluate our ConsRec on two real-world datasets and experimental results show that our model outperforms state-of-the-art methods. An extensive case study also verifies the effectiveness of consensus modeling.

A Post-Training Framework for Improving Heterogeneous Graph Neural Networks

Recent years have witnessed the success of heterogeneous graph neural networks (HGNNs) in modeling heterogeneous information networks (HINs). In this paper, we focus on the benchmark task of HGNNs, i.e., node classification, and empirically find that typical HGNNs are not good at predicting the label of a test node whose receptive field (1) has few training nodes from the same category or (2) has multiple training nodes from different categories. A possible explanation is that their message passing mechanisms may involve noises from different categories, and cannot fully explore task-specific knowledge such as the label dependency between distant nodes. Therefore, instead of introducing a new HGNN model, we propose a general post-training framework that can be applied on any pretrained HGNNs to further inject task-specific knowledge and enhance their prediction performance. Specifically, we first design an auxiliary system that estimates node labels based on (1) a global inference module of multi-channel label propagation and (2) a local inference module of network schema-aware prediction. The mechanism of our auxiliary system can complement the pretrained HGNNs by providing extra task-specific knowledge. During the post-training process, we will strengthen both system-level and module-level consistencies to encourage the cooperation between a pretrained HGNN and our auxiliary system. In this way, both systems can learn from each other for better performance. In experiments, we apply our framework to four typical HGNNs. Experimental results on three benchmark datasets show that compared with pretrained HGNNs, our post-training framework can enhance Micro-F1 by a relative improvement of on average. Code, data and appendix are available at https://github.com/GXM1141/HGPF.

Link Prediction on Latent Heterogeneous Graphs

On graph data, the multitude of node or edge types gives rise to heterogeneous information networks (HINs). To preserve the heterogeneous semantics on HINs, the rich node/edge types become a cornerstone of HIN representation learning. However, in real-world scenarios, type information is often noisy, missing or inaccessible. Assuming no type information is given, we define a so-called latent heterogeneous graph (LHG), which carries latent heterogeneous semantics as the node/edge types cannot be observed. In this paper, we study the challenging and unexplored problem of link prediction on an LHG. As existing approaches depend heavily on type-based information, they are suboptimal or even inapplicable on LHGs. To address the absence of type information, we propose a model named LHGNN, based on the novel idea of semantic embedding at node and path levels, to capture latent semantics on and between nodes. We further design a personalization function to modulate the heterogeneous contexts conditioned on their latent semantics w.r.t. the target node, to enable finer-grained aggregation. Finally, we conduct extensive experiments on four benchmark datasets, and demonstrate the superior performance of LHGNN.

Predicting the Silent Majority on Graphs: Knowledge Transferable Graph Neural Network

Graphs consisting of vocal nodes ("the vocal minority") and silent nodes ("the silent majority"), namely VS-Graph, are ubiquitous in the real world. The vocal nodes tend to have abundant features and labels. In contrast, silent nodes only have incomplete features and rare labels, e.g., the description and political tendency of politicians (vocal) are abundant while not for ordinary civilians (silent) on the twitter’s social network. Predicting the silent majority remains a crucial yet challenging problem. However, most existing Graph Neural Networks (GNNs) assume that all nodes belong to the same domain, without considering the missing features and distribution-shift between domains, leading to poor ability to deal with VS-Graph. To combat the above challenges, we propose Knowledge Transferable Graph Neural Network (KTGNN), which models distribution-shifts during message passing and learns representation by transferring knowledge from vocal nodes to silent nodes. Specifically, we design the domain-adapted "feature completion and message passing mechanism" for node representation learning while preserving domain difference. And a knowledge transferable classifier based on KL-divergence is followed. Comprehensive experiments on real-world scenarios (i.e., company financial risk assessment and political elections) demonstrate the superior performance of our method. Our source code has been open-sourced1.

Lightweight source localization for large-scale social networks

The rapid diffusion of hazardous information in large-flow-based social media causes great economic losses and potential threats to society. It is crucial to infer the inner information source as early as possible to prevent further losses. However, existing localization methods wait until all deployed sensors obtain propagation information before starting source inference within a network, and hence the best opportunity to control propagation is missed. In this paper, we propose a new localization strategy based on finite deployed sensors, named Greedy-coverage-based Rapid Source Localization (GRSL), to rapidly, flexibly and accurately infer the source in the early propagation stage of large-scale networks. There are two phases in GRSL. In the first phase, the Greedy-based Strategy (GS) greedily deploys sensors to rapidly achieve wide area coverage at a low cost. In the second phase, when a propagation event within a network is observed by a part of the sensors, the Inference Strategy (IS) with an earlier response mechanism begins executing the source inference task in an earlier small infected area. Comprehensive experiments with the SOTA methods demonstrate the superior performance and robustness of GRSL in various application scenarios.

Automated Spatio-Temporal Graph Contrastive Learning

Among various region embedding methods, graph-based region relation learning models stand out, owing to their strong structure representation ability for encoding spatial correlations with graph neural networks. Despite their effectiveness, several key challenges have not been well addressed in existing methods: i) Data noise and missing are ubiquitous in many spatio-temporal scenarios due to a variety of factors. ii) Input spatio-temporal data (e.g., mobility traces) usually exhibits distribution heterogeneity across space and time. In such cases, current methods are vulnerable to the quality of the generated region graphs, which may lead to suboptimal performance. In this paper, we tackle the above challenges by exploring the Automated Spatio-Temporal graph contrastive learning paradigm (AutoST) over the heterogeneous region graph generated from multi-view data sources. Our AutoST framework is built upon a heterogeneous graph neural architecture to capture the multi-view region dependencies with respect to POI semantics, mobility flow patterns and geographical positions. To improve the robustness of our GNN encoder against data noise and distribution issues, we design an automated spatio-temporal augmentation scheme with a parameterized contrastive view generator. AutoST can adapt to the spatio-temporal heterogeneous graph with multi-view semantics well preserved. Extensive experiments for three downstream spatio-temporal mining tasks on several real-world datasets demonstrate the significant performance gain achieved by our AutoST over a variety of baselines. The code is publicly available at https://github.com/HKUDS/AutoST.

Graph Neural Networks with Diverse Spectral Filtering

Spectral Graph Neural Networks (GNNs) have achieved tremendous success in graph machine learning, with polynomial filters applied for graph convolutions, where all nodes share the identical filter weights to mine their local contexts. Despite the success, existing spectral GNNs usually fail to deal with complex networks (e.g., WWW) due to such homogeneous spectral filtering setting that ignores the regional heterogeneity as typically seen in real-world networks. To tackle this issue, we propose a novel diverse spectral filtering (DSF) framework, which automatically learns node-specific filter weights to exploit the varying local structure properly. Particularly, the diverse filter weights consist of two components — A global one shared among all nodes, and a local one that varies along network edges to reflect node difference arising from distinct graph parts — to balance between local and global information. As such, not only can the global graph characteristics be captured, but also the diverse local patterns can be mined with awareness of different node positions. Interestingly, we formulate a novel optimization problem to assist in learning diverse filters, which also enables us to enhance any spectral GNNs with our DSF framework. We showcase the proposed framework on three state-of-the-arts including GPR-GNN, BernNet, and JacobiConv. Extensive experiments over 10 benchmark datasets demonstrate that our framework can consistently boost model performance by up to 4.92% in node classification tasks, producing diverse filters with enhanced interpretability.

Characterization of Simplicial Complexes by Counting Simplets Beyond Four Nodes

Simplicial complexes are higher-order combinatorial structures which have been used to represent real-world complex systems. In this paper, we concentrate on the local patterns in simplicial complexes called simplets, a generalization of graphlets. We formulate the problem of counting simplets of a given size in a given simplicial complex. For this problem, we extend a sampling algorithm based on color coding from graphs to simplicial complexes, with essential technical novelty. We theoretically analyze our proposed algorithm named SC3, showing its correctness, unbiasedness, convergence, and time/space complexity. Through the extensive experiments on sixteen real-world datasets, we show the superiority of SC3 in terms of accuracy, speed, and scalability, compared to the baseline methods. Finally, we use the counts given by SC3 for simplicial complex analysis, especially for characterization, which is further used for simplicial complex clustering, where SC3 shows a strong ability of characterization with domain-based similarity.

Robust Mid-Pass Filtering Graph Convolutional Networks

Graph convolutional networks (GCNs) are currently the most promising paradigm for dealing with graph-structure data, while recent studies have also shown that GCNs are vulnerable to adversarial attacks. Thus developing GCN models that are robust to such attacks become a hot research topic. However, the structural purification learning-based or robustness constraints-based defense GCN methods are usually designed for specific data or attacks, and introduce additional objective that is not for classification. Extra training overhead is also required in their design. To address these challenges, we conduct in-depth explorations on mid-frequency signals on graphs and propose a simple yet effective Mid-pass filter GCN (Mid-GCN). Theoretical analyses guarantee the robustness of signals through the mid-pass filter, and we also shed light on the properties of different frequency signals under adversarial attacks. Extensive experiments on six benchmark graph data further verify the effectiveness of our designed Mid-GCN in node classification accuracy compared to state-of-the-art GCNs under various adversarial attack strategies.

Semi-decentralized Federated Ego Graph Learning for Recommendation

Collaborative filtering (CF) based recommender systems are typically trained based on personal interaction data (e.g., clicks and purchases) that could be naturally represented as ego graphs. However, most existing recommendation methods collect these ego graphs from all users to compose a global graph to obtain high-order collaborative information between users and items, and these centralized CF recommendation methods inevitably lead to a high risk of user privacy leakage. Although recently proposed federated recommendation systems can mitigate the privacy problem, they either restrict the on-device local training to an isolated ego graph or rely on an additional third-party server to access other ego graphs resulting in a cumbersome pipeline, which is hard to work in practice. In addition, existing federated recommendation systems require resource-limited devices to maintain the entire embedding tables resulting in high communication costs.

In light of this, we propose a semi-decentralized federated ego graph learning framework for on-device recommendations, named SemiDFEGL, which introduces new device-to-device collaborations to improve scalability and reduce communication costs and innovatively utilizes predicted interacted item nodes to connect isolated ego graphs to augment local subgraphs such that the high-order user-item collaborative information could be used in a privacy-preserving manner. Furthermore, the proposed framework is model-agnostic, meaning that it could be seamlessly integrated with existing graph neural network-based recommendation methods and privacy protection techniques. To validate the effectiveness of the proposed SemiDFEGL, extensive experiments are conducted on three public datasets, and the results demonstrate the superiority of the proposed SemiDFEGL compared to other federated recommendation methods.

xGCN: An Extreme Graph Convolutional Network for Large-scale Social Link Prediction

Graph neural networks (GNNs) have seen widespread usage across multiple real-world applications, yet in transductive learning, they still face challenges in accuracy, efficiency, and scalability, due to the extensive number of trainable parameters in the embedding table and the paradigm of stacking neighborhood aggregations. This paper presents a novel model called xGCN for large-scale network embedding, which is a practical solution for link predictions. xGCN addresses these issues by encoding graph-structure data in an extreme convolutional manner, and has the potential to push the performance of network embedding-based link predictions to a new record. Specifically, instead of assigning each node with a directly learnable embedding vector, xGCN regards node embeddings as static features. It uses a propagation operation to smooth node embeddings and relies on a Refinement neural Network (RefNet) to transform the coarse embeddings derived from the unsupervised propagation into new ones that optimize a training objective. The output of RefNet, which are well-refined embeddings, will replace the original node embeddings. This process is repeated iteratively until the model converges to a satisfying status. Experiments on three social network datasets with link prediction tasks show that xGCN not only achieves the best accuracy compared with a series of competitive baselines but also is highly efficient and scalable.

SINCERE: Sequential Interaction Networks representation learning on Co-Evolving RiEmannian manifolds

Sequential interaction networks (SIN) have been commonly adopted in many applications such as recommendation systems, search engines and social networks to describe the mutual influence between users and items/products. Efforts on representing SIN are mainly focused on capturing the dynamics of networks in Euclidean space, and recently plenty of work has extended to hyperbolic geometry for implicit hierarchical learning. Previous approaches which learn the embedding trajectories of users and items achieve promising results. However, there are still a range of fundamental issues remaining open. For example, is it appropriate to place user and item nodes in one identical space regardless of their inherent discrepancy? Instead of residing in a single fixed curvature space, how will the representation spaces evolve when new interaction occurs?

To explore these implication for sequential interaction networks, we propose SINCERE, a novel method representing Sequential Interaction Networks on Co-Evolving RiEmannian manifolds. SINCERE not only takes the user and item embedding trajectories in respective spaces into account, but also emphasizes on the space evolvement that how curvature changes over time. Specifically, we introduce a fresh cross-geometry aggregation which allows us to propagate information across different Riemannian manifolds without breaking conformal invariance, and a curvature estimator which is delicately designed to predict global curvatures effectively according to current local Ricci curvatures. Extensive experiments on several real-world datasets demonstrate the promising performance of SINCERE over the state-of-the-art sequential interaction prediction methods.

PARROT: Position-Aware Regularized Optimal Transport for Network Alignment

Network alignment is a critical steppingstone behind a variety of multi-network mining tasks. Most of the existing methods essentially optimize a Frobenius-like distance or ranking-based loss, ignoring the underlying geometry of graph data. Optimal transport (OT), together with Wasserstein distance, has emerged to be a powerful approach accounting for the underlying geometry explicitly. Promising as it might be, the state-of-the-art OT-based alignment methods suffer from two fundamental limitations, including (1) effectiveness due to the insufficient use of topology and consistency information and (2) scalability due to the non-convex formulation and repeated computationally costly loss calculation. In this paper, we propose a position-aware regularized optimal transport framework for network alignment named PARROT. To tackle the effectiveness issue, the proposed PARROT captures topology information by random walk with restart, with three carefully designed consistency regularization terms. To tackle the scalability issue, the regularized OT problem is decomposed into a series of convex subproblems and can be efficiently solved by the proposed constrained proximal point method with guaranteed convergence. Extensive experiments show that our algorithm achieves significant improvements in both effectiveness and scalability, outperforming the state-of-the-art network alignment methods and speeding up existing OT-based methods by up to 100 times.

Joint Internal Multi-Interest Exploration and External Domain Alignment for Cross Domain Sequential Recommendation

Sequential Cross-Domain Recommendation (CDR) has been popularly studied to utilize different domain knowledge and users’ historical behaviors for the next-item prediction. In this paper, we focus on the cross-domain sequential recommendation problem. This commonly exist problem is rather challenging from two perspectives, i.e., the implicit user historical rating sequences are difficult in modeling and the users/items on different domains are mostly non-overlapped. Most previous sequential CDR approaches cannot solve the cross-domain sequential recommendation problem well, since (1) they cannot sufficiently depict the users’ actual preferences, (2) they cannot leverage and transfer useful knowledge across domains. To tackle the above issues, we propose joint Internal multi-interest exploration and External domain alignment for cross domain Sequential Recommendation model (IESRec). IESRec includes two main modules, i.e., internal multi-interest exploration module and external domain alignment module. To reflect the users’ diverse characteristics with multi-interests evolution, we first propose internal temporal optimal transport method in the internal multi-interest exploration module. We further propose external alignment optimal transport method in the external domain alignment module to reduce domain discrepancy for the item embeddings. Our empirical studies on Amazon datasets demonstrate that IESRec significantly outperforms the state-of-the-art models.

Graph Neural Network with Two Uplift Estimators for Label-Scarcity Individual Uplift Modeling

Uplift modeling aims to measure the incremental effect, which we call uplift, of a strategy or action on the users from randomized experiments or observational data. Most existing uplift methods only use individual data, which are usually not informative enough to capture the unobserved and complex hidden factors regarding the uplift. Furthermore, uplift modeling scenario usually has scarce labeled data, especially for the treatment group, which also poses a great challenge for model training. Considering that the neighbors’ features and the social relationships are very informative to characterize a user’s uplift, we propose a graph neural network-based framework with two uplift estimators, called GNUM, to learn from the social graph for uplift estimation. Specifically, we design the first estimator based on a class-transformed target. The estimator is general for all types of outcomes, and is able to comprehensively model the treatment and control group data together to approach the uplift. When the outcome is discrete, we further design the other uplift estimator based on our defined partial labels, which is able to utilize more labeled data from both the treatment and control groups, to further alleviate the label scarcity problem. Comprehensive experiments on a public dataset and two industrial datasets show a superior performance of our proposed framework over state-of-the-art methods under various evaluation metrics. The proposed algorithms have been deployed online to serve real-world uplift estimation scenarios.

Label Information Enhanced Fraud Detection against Low Homophily in Graphs

Node classification is a substantial problem in graph-based fraud detection. Many existing works adopt Graph Neural Networks (GNNs) to enhance fraud detectors. While promising, currently most GNN-based fraud detectors fail to generalize to the low homophily setting. Besides, label utilization has been proved to be significant factor for node classification problem. But we find they are less effective in fraud detection tasks due to the low homophily in graphs. In this work, we propose GAGA, a novel Group AGgregation enhanced TrAnsformer, to tackle the above challenges. Specifically, the group aggregation provides a portable method to cope with the low homophily issue. Such an aggregation explicitly integrates the label information to generate distinguishable neighborhood information. Along with group aggregation, an attempt towards end-to-end trainable group encoding is proposed which augments the original feature space with the class labels. Meanwhile, we devise two additional learnable encodings to recognize the structural and relational context. Then, we combine the group aggregation and the learnable encodings into a Transformer encoder to capture the semantic information. Experimental results clearly show that GAGA outperforms other competitive graph-based fraud detectors by up to 24.39% on two trending public datasets and a real-world industrial dataset from Baidu. Even more, the group aggregation is demonstrated to outperform other label utilization methods (e.g., C&S, BoT/UniMP) in the low homophily setting.

GraphPrompt: Unifying Pre-Training and Downstream Tasks for Graph Neural Networks

Graphs can model complex relationships between objects, enabling a myriad of Web applications such as online page/article classification and social recommendation. While graph neural networks (GNNs) have emerged as a powerful tool for graph representation learning, in an end-to-end supervised setting, their performance heavily relies on a large amount of task-specific supervision. To reduce labeling requirement, the “pre-train, fine-tune” and “pre-train, prompt” paradigms have become increasingly common. In particular, prompting is a popular alternative to fine-tuning in natural language processing, which is designed to narrow the gap between pre-training and downstream objectives in a task-specific manner. However, existing study of prompting on graphs is still limited, lacking a universal treatment to appeal to different downstream tasks. In this paper, we propose GraphPrompt, a novel pre-training and prompting framework on graphs. GraphPrompt not only unifies pre-training and downstream tasks into a common task template, but also employs a learnable prompt to assist a downstream task in locating the most relevant knowledge from the pre-trained model in a task-specific manner. Finally, we conduct extensive experiments on five public datasets to evaluate and analyze GraphPrompt.

An Attentional Multi-scale Co-evolving Model for Dynamic Link Prediction

Dynamic link prediction is essential for a wide range of domains, including social networks, bioinformatics, knowledge bases, and recommender systems. Existing works have demonstrated that structural information and temporal information are two of the most important information for this problem. However, existing works either focus on modeling them independently or modeling the temporal dynamics of a single structural scale, neglecting the complex correlations among them. This paper proposes to model the inherent correlations among the evolving dynamics of different structural scales for dynamic link prediction. Following this idea, we propose an Attentional Multi-scale Co-evolving Network (AMCNet). Specifically, We model multi-scale structural information by a motif-based graph neural network with multi-scale pooling. Then, we design a hierarchical attention-based sequence-to-sequence model for learning the complex correlations among the evolution dynamics of different structural scales. Extensive experiments on four real-world datasets with different characteristics demonstrate that AMCNet significantly outperforms the state-of-the-art in both single-step and multi-step dynamic link prediction tasks.

Robust Graph Representation Learning for Local Corruption Recovery

The performance of graph representation learning is affected by the quality of graph input. While existing research usually pursues a globally smoothed graph embedding, we believe the rarely observed anomalies are as well harmful to an accurate prediction. This work establishes a graph learning scheme that automatically detects (locally) corrupted feature attributes and recovers robust embedding for prediction tasks. The detection operation leverages a graph autoencoder, which does not make any assumptions about the distribution of the local corruptions. It pinpoints the positions of the anomalous node attributes in an unbiased mask matrix, where robust estimations are recovered with sparsity promoting regularizer. The optimizer approaches a new embedding that is sparse in the framelet domain and conditionally close to input observations. Extensive experiments are provided to validate our proposed model can recover a robust graph representation from black-box poisoning and achieve excellent performance.

Intra and Inter Domain HyperGraph Convolutional Network for Cross-Domain Recommendation

Cross-Domain Recommendation (CDR) aims to solve the data sparsity problem by integrating the strengths of different domains. Though researchers have proposed various CDR methods to effectively transfer knowledge across domains, they fail to address the following key issues, i.e., (1) they cannot model high-order correlations among users and items in every single domain to obtain more accurate representations; (2) they cannot model the correlations among items across different domains. To tackle the above issues, we propose a novel Intra and Inter Domain HyperGraph Convolutional Network (II-HGCN) framework, which includes two main layers in the modeling process, i.e., the intra-domain layer and the inter-domain layer. In the intra-domain layer, we design a user hypergraph and an item hypergraph to model high-order correlations inside every single domain. Thus we can address the data sparsity problem better and learn high-quality representations of users and items. In the inter-domain layer, we propose an inter-domain hypergraph structure to explore correlations among items from different domains based on their interactions with common users. Therefore we can not only transfer the knowledge of users but also combine embeddings of items across domains. Comprehensive experiments on three widely used benchmark datasets demonstrate that II-HGCN outperforms other state-of-the-art methods, especially when datasets are extremely sparse.

Hyperbolic Geometric Graph Representation Learning for Hierarchy-imbalance Node Classification

Learning unbiased node representations for imbalanced samples in the graph has become a more remarkable and important topic. For the graph, a significant challenge is that the topological properties of the nodes (e.g., locations, roles) are unbalanced (topology-imbalance), other than the number of training labeled nodes (quantity-imbalance). Existing studies on topology-imbalance focus on the location or the local neighborhood structure of nodes, ignoring the global underlying hierarchical properties of the graph, i.e., hierarchy. In the real-world scenario, the hierarchical structure of graph data reveals important topological properties of graphs and is relevant to a wide range of applications. We find that training labeled nodes with different hierarchical properties have a significant impact on the node classification tasks and confirm it in our experiments. It is well known that hyperbolic geometry has a unique advantage in representing the hierarchical structure of graphs. Therefore, we attempt to explore the hierarchy-imbalance issue for node classification of graph neural networks with a novelty perspective of hyperbolic geometry, including its characteristics and causes. Then, we propose a novel hyperbolic geometric hierarchy-imbalance learning framework, named HyperIMBA, to alleviate the hierarchy-imbalance issue caused by uneven hierarchy-levels and cross-hierarchy connectivity patterns of labeled nodes. Extensive experimental results demonstrate the superior effectiveness of HyperIMBA for hierarchy-imbalance node classification tasks.

Graph Neural Networks without Propagation

Due to the simplicity, intuition and explanation, most Graph Neural Networks (GNNs) are proposed by following the pipeline of message passing. Although they achieve superior performances in many tasks, propagation-based GNNs possess three essential drawbacks. Firstly, the propagation tends to produce smooth effect, which meets the inductive bias of homophily, and causes two serious issues: over-smoothing issue and performance drop on networks with heterophily. Secondly, the propagations to each node are irrelevant, which prevents GNNs from modeling high-order relation, and cause the GNNs fragile to the attributes noises. Thirdly, propagation-based GNNs may be fragile to topology noise, since they heavily relay on propagation over the topology. Therefore, the propagation, as the key component of most GNNs, may be the essence of some serious issues in GNNs. To get to the root of these issue, this paper attempts to replace the propagation with a novel local operation. Quantitative experimental analysis reveals: 1) the existence of low-rank characteristic in the node attributes from ego-networks and 2) the performance improvement by reducing its rank. Motivated by this finding, this paper propose the Low-Rank GNNs, whose key component is the low-rank attribute matrix approximation in ego-network. The graph topology is employed to construct the ego-networks instead of message propagation, which is sensitive to topology noises. The proposed Low-Rank GNNs posses some attractive characteristics, including robust to topology and attribute noises, parameter-free and parallelizable. Experimental evaluations demonstrate the superior performance, robustness to noises and universality of the proposed Low-Rank GNNs.

TIGER: Temporal Interaction Graph Embedding with Restarts

Temporal interaction graphs (TIGs), consisting of sequences of timestamped interaction events, are prevalent in fields like e-commerce and social networks. To better learn dynamic node embeddings that vary over time, researchers have proposed a series of temporal graph neural networks for TIGs. However, due to the entangled temporal and structural dependencies, existing methods have to process the sequence of events chronologically and consecutively to ensure node representations are up-to-date. This prevents existing models from parallelization and reduces their flexibility in industrial applications. To tackle the above challenge, in this paper, we propose TIGER, a TIG embedding model that can restart at any timestamp. We introduce a restarter module that generates surrogate representations acting as the warm initialization of node representations. By restarting from multiple timestamps simultaneously, we divide the sequence into multiple chunks and naturally enable the parallelization of the model. Moreover, in contrast to previous models that utilize a single memory unit, we introduce a dual memory module to better exploit neighborhood information and alleviate the staleness problem. Extensive experiments on four public datasets and one industrial dataset are conducted, and the results verify both the effectiveness and the efficiency of our work.

Self-Supervised Teaching and Learning of Representations on Graphs

Recent years have witnessed significant advances in graph contrastive learning (GCL), while most GCL models use graph neural networks as encoders based on supervised learning. In this work, we propose a novel graph learning model called GraphTL, which explores self-supervised teaching and learning of representations on graphs. One critical objective of GCL is to retain original graph information. For this purpose, we design an encoder based on the idea of unsupervised dimensionality reduction of locally linear embedding (LLE). Specifically, we map one iteration of the LLE to one layer of the network. To guide the encoder to better retain the original graph information, we propose an unbalanced contrastive model consisting of two views, which are the learning view and the teaching view, respectively. Furthermore, we consider the nodes that are identical in muti-views as positive node pairs, and design the node similarity scorer so that the model can select positive samples of a target node. Extensive experiments have been conducted over multiple datasets to evaluate the performance of GraphTL in comparison with baseline models. Results demonstrate that GraphTL can reduce distances between similar nodes while preserving network topological and feature information, yielding better performance in node classification.

SE-GSL: A General and Effective Graph Structure Learning Framework through Structural Entropy Optimization

Graph Neural Networks (GNNs) are de facto solutions to structural data learning. However, it is susceptible to low-quality and unreliable structure, which has been a norm rather than an exception in real-world graphs. Existing graph structure learning (GSL) frameworks still lack robustness and interpretability. This paper proposes a general GSL framework, SE-GSL, through structural entropy and the graph hierarchy abstracted in the encoding tree. Particularly, we exploit the one-dimensional structural entropy to maximize embedded information content when auxiliary neighbourhood attributes is fused to enhance the original graph. A new scheme of constructing optimal encoding trees are proposed to minimize the uncertainty and noises in the graph whilst assuring proper community partition in hierarchical abstraction. We present a novel sample-based mechanism for restoring the graph structure via node structural entropy distribution. It increases the connectivity among nodes with larger uncertainty in lower-level communities. SE-GSL is compatible with various GNN models and enhances the robustness towards noisy and heterophily structures. Extensive experiments show significant improvements in the effectiveness and robustness of structure learning and node representation learning.

Homophily-oriented Heterogeneous Graph Rewiring

With the rapid development of the World Wide Web (WWW), heterogeneous graphs (HG) have explosive growth. Recently, heterogeneous graph neural network (HGNN) has shown great potential in learning on HG. Current studies of HGNN mainly focus on some HGs with strong homophily properties (nodes connected by meta-path tend to have the same labels), while few discussions are made in those that are less homophilous. Recently, there have been many works on homogeneous graphs with heterophily. However, due to heterogeneity, it is non-trivial to extend their approach to deal with HGs with heterophily. In this work, based on empirical observations, we propose a meta-path-induced metric to measure the homophily degree of a HG. We also find that current HGNNs may have degenerated performance when handling HGs with less homophilous properties. Thus it is essential to increase the generalization ability of HGNNs on non-homophilous HGs. To this end, we propose HDHGR, a homophily-oriented deep heterogeneous graph rewiring approach that modifies the HG structure to increase the performance of HGNN. We theoretically verify HDHGR. In addition, experiments on real-world HGs demonstrate the effectiveness of HDHGR, which brings at most more than 10% relative gain.

HGWaveNet: A Hyperbolic Graph Neural Network for Temporal Link Prediction

Temporal link prediction, aiming to predict future edges between paired nodes in a dynamic graph, is of vital importance in diverse applications. However, existing methods are mainly built upon uniform Euclidean space, which has been found to be conflict with the power-law distributions of real-world graphs and unable to represent the hierarchical connections between nodes effectively. With respect to the special data characteristic, hyperbolic geometry offers an ideal alternative due to its exponential expansion property. In this paper, we propose HGWaveNet, a novel hyperbolic graph neural network that fully exploits the fitness between hyperbolic spaces and data distributions for temporal link prediction. Specifically, we design two key modules to learn the spatial topological structures and temporal evolutionary information separately. On the one hand, a hyperbolic diffusion graph convolution (HDGC) module effectively aggregates information from a wider range of neighbors. On the other hand, the internal order of causal correlation between historical states is captured by hyperbolic dilated causal convolution (HDCC) modules. The whole model is built upon the hyperbolic spaces to preserve the hierarchical structural information in the entire data flow. To prove the superiority of HGWaveNet, extensive experiments are conducted on six real-world graph datasets and the results show a relative improvement by up to 6.67% on AUC for temporal link prediction over SOTA methods.

Rethinking Structural Encodings: Adaptive Graph Transformer for Node Classification Task

Graph Transformers have proved their advantages in graph data mining with elaborate Positional Encodings, especially in graph-level tasks. However, their application in the node classification task has not been fully exploited yet. In the node classification task, existing Graph Transformers with Positional Encodings are limited by the following issues: (i) PEs describing the node’s positional identities are insufficient for the node classification task on complex graphs, where a full portrayal of the local node property is needed. (ii) PEs for graphs are integrated with Transformers in a constant schema, resulting in the ignorance of local patterns that may vary among different nodes. In this paper, we propose Adaptive Graph Transformer (AGT) to tackle above issues. AGT consists of a Learnable Centrality Encoding and a Kernelized Local Structure Encoding. The two modules extract structural patterns from centrality and subgraph views in a learnable and scalable manner. Further, we design the Adaptive Transformer Block to adaptively integrate the attention scores and Structural Encodings in a node-specific manner. AGT achieves state-of-the-art performances on nine real-world web graphs (up to 1.6 million nodes). Furthermore, AGT shows outstanding results on two series of synthetic graphs with ranges of heterophily and noise ratios.

CMINet: a Graph Learning Framework for Content-aware Multi-channel Influence Diffusion

The phenomena of influence diffusion on social networks have received tremendous research interests in the past decade. While most prior works mainly focus on predicting the total influence spread on a single network, a marketing campaign that exploits influence diffusion often involves multiple channels with various information disseminated on different media. In this paper, we introduce a new influence estimation problem, namely Content-aware Multi-channel Influence Diffusion (CMID), and accordingly propose CMINet to predict newly influenced users, given a set of seed users with different multimedia contents. In CMINet, we first introduce DiffGNN to encode the influencing power of users (nodes) and Influence-aware Optimal Transport (IOT) to align the embeddings to address the distribution shift across different diffusion channels. Then, we transform CMID into a node classification problem and propose Social-based Multimedia Feature Extractor (SMFE) and Content-aware Multi-channel Influence Propagation (CMIP) to jointly learn the user preferences on multimedia contents and predict the susceptibility of users. Furthermore, we prove that CMINet preserves monotonicity and submodularity, thus enabling (1 − 1/e)-approximate solutions for influence maximization. Experimental results manifest that CMINet outperforms eleven baselines on three public datasets.

Federated Node Classification over Graphs with Latent Link-type Heterogeneity

Federated learning (FL) aims to train powerful and generalized global models without putting distributed data together, which has been shown effective in various domains of machine learning. The non-IIDness of data across local clients has been a major challenge for FL. In graphs, one specifically important perspective of non-IIDness is manifested in the link-type heterogeneity underlying homogeneous graphs– the seemingly uniform links captured in most real-world networks can carry different levels of homophily or semantics of relations, while the exact sets and distributions of such latent link-types can further differ across local clients. Through our preliminary data analysis, we are motivated to design a new graph FL framework that can simultaneously discover latent link-types and model message-passing w.r.t. the discovered link-types through the collaboration of distributed local clients. Specifically, we propose a framework FedLit that can dynamically detect the latent link-types during FL via an EM-based clustering algorithm and differentiate the message-passing through different types of links via multiple convolution channels. For experiments, we synthesize multiple realistic datasets of graphs with latent heterogeneous link-types from real-world data, and partition them with different levels of link-type heterogeneity. Comprehensive experimental results and in-depth analysis have demonstrated both superior performance and rational behaviors of our proposed techniques.

Expressive and Efficient Representation Learning for Ranking Links in Temporal Graphs

Temporal graph representation learning (T-GRL) aims to learn representations that model how graph edges evolve over time. While recent works on T-GRL have improved link prediction accuracy in temporal settings, their methods optimize a point-wise loss function independently over future links rather than optimize jointly over a candidate set per node. In applications where resources (e.g., attention) are allocated based on ranking links by likelihood, the use of a ranking loss is preferred. However it is not straightforward to develop a T-GRL method to optimize a ranking loss due to a tradeoff between model expressivity and scalability. In this work, we address these issues and propose a Temporal Graph network for Ranking (TGRank), which significantly improves performance for link prediction tasks by (i) optimizing a list-wise loss for improved ranking, and (ii) incorporating a labeling approach designed to allow for efficient inference over the candidate set jointly, while provably boosting expressivity. We extensively evaluate TGRank over six real networks. TGRank outperforms the state-of-the-art baselines on average by 14.21%↑ (transductive) and 16.25% ↑ (inductive) in ranking metrics while being more efficient (up-to 65 × speed-up) to make inference on large networks.

Semi-Supervised Embedding of Attributed Multiplex Networks

Complex information can be represented as networks (graphs) characterized by a large number of nodes, multiple types of nodes, and multiple types of relationships between them, i.e. multiplex networks. Additionally, these networks are enriched with different types of node features.

We propose a Semi-supervised Embedding approach for Attributed Multiplex Networks (SSAMN), to jointly embed nodes, node attributes, and node labels of multiplex networks in a low dimensional space. Network embedding techniques have garnered research attention for real-world applications. However, most existing techniques solely focus on learning the node embeddings, and only a few learn class label embeddings. Our method assumes that we have different classes of nodes and that we know the class label of some, very few nodes for every class. Guided by this type of supervision, SSAMN learns a low-dimensional representation incorporating all information in a large labeled multiplex network. SSAMN integrates techniques from Spectral Embedding and Homogeneity Analysis to improve the embedding of nodes, node attributes, and node labels. Our experiments demonstrate that we only need very few labels per class in order to have a final embedding that preservers the information of the graph. To evaluate the performance of SSAMN, we run experiments on four real-world datasets. The results show that our approach outperforms state-of-the-art methods for downstream tasks such as semi-supervised node classification and node clustering.

Search to Capture Long-range Dependency with Stacking GNNs for Graph Classification

In recent years, Graph Neural Networks (GNNs) have been popular in the graph classification task. Currently, shallow GNNs are more common due to the well-known over-smoothing problem facing deeper GNNs. However, they are sub-optimal without utilizing the information from distant nodes, i.e., the long-range dependencies. The mainstream methods in the graph classification task can extract the long-range dependencies either by designing the pooling operations or incorporating the higher-order neighbors, while they have evident drawbacks by modifying the original graph structure, which may result in information loss in graph structure learning. In this paper, by justifying the smaller influence of the over-smoothing problem in the graph classification task, we evoke the importance of stacking-based GNNs and then employ them to capture the long-range dependencies without modifying the original graph structure. To achieve this, two design needs are given for stacking-based GNNs, i.e., sufficient model depth and adaptive skip-connection schemes. By transforming the two design needs into designing data-specific inter-layer connections, we propose a novel approach with the help of neural architecture search (NAS), which is dubbed LRGNN (Long-Range Graph Neural Networks). Extensive experiments on five datasets show that the proposed LRGNN can achieve the best performance, and obtained data-specific GNNs with different depth and skip-connection schemes, which can better capture the long-range dependencies. 1

HINormer: Representation Learning On Heterogeneous Information Networks with Graph Transformer

Recent studies have highlighted the limitations of message-passing based graph neural networks (GNNs), e.g., limited model expressiveness, over-smoothing, over-squashing, etc. To alleviate these issues, Graph Transformers (GTs) have been proposed which work in the paradigm that allows message passing to a larger coverage even across the whole graph. Hinging on the global range attention mechanism, GTs have shown a superpower for representation learning on homogeneous graphs. However, the investigation of GTs on heterogeneous information networks (HINs) is still under-exploited. In particular, on account of the existence of heterogeneity, HINs show distinct data characteristics and thus require different treatment. To bridge this gap, in this paper we investigate the representation learning on HINs with Graph Transformer, and propose a novel model named HINormer, which capitalizes on a larger-range aggregation mechanism for node representation learning. In particular, assisted by two major modules, i.e., a local structure encoder and a heterogeneous relation encoder, HINormer can capture both the structural and heterogeneous information of nodes on HINs for comprehensive node representations. We conduct extensive experiments on four HIN benchmark datasets, which demonstrate that our proposed model can outperform the state-of-the-art.

Auto-HeG: Automated Graph Neural Network on Heterophilic Graphs

Graph neural architecture search (NAS) has gained popularity in automatically designing powerful graph neural networks (GNNs) with relieving human efforts. However, existing graph NAS methods mainly work under the homophily assumption and overlook another important graph property, i.e., heterophily, which exists widely in various real-world applications. To date, automated heterophilic graph learning with NAS is still a research blank to be filled in. Due to the complexity and variety of heterophilic graphs, the critical challenge of heterophilic graph NAS mainly lies in developing the heterophily-specific search space and strategy. Therefore, in this paper, we propose a novel automated graph neural network on heterophilic graphs, namely Auto-HeG, to automatically build heterophilic GNN models with expressive learning abilities. Specifically, Auto-HeG incorporates heterophily into all stages of automatic heterophilic graph learning, including search space design, supernet training, and architecture selection. Through the diverse message-passing scheme with joint micro-level and macro-level designs, we first build a comprehensive heterophilic GNN search space, enabling Auto-HeG to integrate complex and various heterophily of graphs. With a progressive supernet training strategy, we dynamically shrink the initial search space according to layer-wise variation of heterophily, resulting in a compact and efficient supernet. Taking a heterophily-aware distance criterion as the guidance, we conduct heterophilic architecture selection in the leave-one-out pattern, so that specialized and expressive heterophilic GNN architectures can be derived. Extensive experiments illustrate the superiority of Auto-HeG in developing excellent heterophilic GNNs to human-designed models and graph NAS models.

Generating Counterfactual Hard Negative Samples for Graph Contrastive Learning

Graph contrastive learning has emerged as a powerful unsupervised graph representation learning tool. The key to the success of graph contrastive learning is to acquire high-quality positive and negative samples as contrasting pairs to learn the underlying structural semantics of the input graph. Recent works usually sample negative samples from the same training batch with the positive samples or from an external irrelevant graph. However, a significant limitation lies in such strategies: the unavoidable problem of sampling false negative samples. In this paper, we propose a novel method to utilize Counterfactual mechanism to generate artificial hard negative samples for Graph Contrastive learning, namely CGC. We utilize a counterfactual mechanism to produce hard negative samples, ensuring that the generated samples are similar but have labels that differ from the positive sample. The proposed method achieves satisfying results on several datasets. It outperforms some traditional unsupervised graph learning methods and some SOTA graph contrastive learning methods. We also conducted some supplementary experiments to illustrate the proposed method, including the performances of CGC with different hard negative samples and evaluations for hard negative samples generated with different similarity measurements. The implementation code is available online to ease reproducibility1.

Minimum Topology Attacks for Graph Neural Networks

With the great popularity of Graph Neural Networks (GNNs), their robustness to adversarial topology attacks has received significant attention. Although many attack methods have been proposed, they mainly focus on fixed-budget attacks, aiming at finding the most adversarial perturbations within a fixed budget for target node. However, considering the varied robustness of each node, there is an inevitable dilemma caused by the fixed budget, i.e., no successful perturbation is found when the budget is relatively small, while if it is too large, the yielding redundant perturbations will hurt the invisibility. To break this dilemma, we propose a new type of topology attack, named minimum-budget topology attack, aiming to adaptively find the minimum perturbation sufficient for a successful attack on each node. To this end, we propose an attack model, named MiBTack, based on a dynamic projected gradient descent algorithm, which can effectively solve the involving non-convex constraint optimization on discrete topology. Extensive results on three GNNs and four real-world datasets show that MiBTack can successfully lead all target nodes misclassified with the minimum perturbation edges. Moreover, the obtained minimum budget can be used to measure node robustness, so we can explore the relationships of robustness, topology, and uncertainty for nodes, which is beyond what the current fixed-budget topology attacks can offer.

Multi-head Variational Graph Autoencoder Constrained by Sum-product Networks

Variational graph autoencoder (VGAE) is a promising deep probabilistic model in graph representation learning. However, most existing VGAEs adopt the mean-field assumption, and cannot characterize the graphs with noise well. In this paper, we propose a novel deep probabilistic model for graph analysis, termed Multi-head Variational Graph Autoencoder Constrained by Sum-product Networks (named SPN-MVGAE), which helps to relax the mean-field assumption and learns better latent representation with fault tolerance. Our proposed model SPN-MVGAE uses conditional sum-product networks as constraints to learn the dependencies between latent factors in an end-to-end manner. Furthermore, we introduce the superposition of the latent representations learned by multiple variational networks to represent the final latent representations of nodes. Our model is the first use sum-product networks for graph representation learning, extending the scope of sum-product networks applications. Experimental results show that compared with other baseline methods, our model has competitive advantages in link prediction, fault tolerance, node classification, and graph visualization on real datasets.

GIF: A General Graph Unlearning Strategy via Influence Function

With the greater emphasis on privacy and security in our society, the problem of graph unlearning — revoking the influence of specific data on the trained GNN model, is drawing increasing attention. However, ranging from machine unlearning to recently emerged graph unlearning methods, existing efforts either resort to retraining paradigm, or perform approximate erasure that fails to consider the inter-dependency between connected neighbors or imposes constraints on GNN structure, therefore hard to achieve satisfying performance-complexity trade-offs.

In this work, we explore the influence function tailored for graph unlearning, so as to improve the unlearning efficacy and efficiency for graph unlearning. We first present a unified problem formulation of diverse graph unlearning tasks w.r.t. node, edge, and feature. Then, we recognize the crux to the inability of traditional influence function for graph unlearning, and devise Graph Influence Function (GIF), a model-agnostic unlearning method that can efficiently and accurately estimate parameter changes in response to a ϵ -mass perturbation in deleted data. The idea is to supplement the objective of the traditional influence function with an additional loss term of the influenced neighbors due to the structural dependency. Further deductions on the closed-form solution of parameter changes provide a better understanding of the unlearning mechanism. We conduct extensive experiments on four representative GNN models and three benchmark datasets to justify the superiority of GIF for diverse graph unlearning tasks in terms of unlearning efficacy, model utility, and unlearning efficiency. Our implementations are available at https://github.com/wujcan/GIF-torch/.

Learning Mixtures of Markov Chains with Quality Guarantees

A large number of modern applications ranging from listening songs online and browsing the Web to using a navigation app on a smartphone generate a plethora of user trails. Clustering such trails into groups with a common sequence pattern can reveal significant structure in human behavior that can lead to improving user experience through better recommendations, and even prevent suicides [14]. One approach to modeling this problem mathematically is as a mixture of Markov chains. Recently, Gupta, Kumar and Vassilvitski [10] introduced an algorithm () based on the singular value decomposition (SVD) that under certain conditions can perfectly recover a mixture of L chains on n states, given only the distribution of trails of length 3 (3-trail).

In this work we contribute to the problem of unmixing Markov chains by highlighting and addressing two important constraints of the algorithm [10]: some chains in the mixture may not even be weakly connected, and secondly in practice one does not know beforehand the true number of chains. We resolve these issues in the Gupta et al. paper [10]. Specifically, we propose an algebraic criterion that enables us to choose a value of L efficiently that avoids overfitting. Furthermore, we design a reconstruction algorithm that outputs the true mixture in the presence of disconnected chains and is robust to noise. We complement our theoretical results with experiments on both synthetic and real data, where we observe that our method outperforms the algorithm. Finally, we empirically observe that combining EM with our method performs best in practice, both in terms of reconstruction error with respect to the distribution of 3-trails and the mixture of Markov chains.

INCREASE: Inductive Graph Representation Learning for Spatio-Temporal Kriging

Spatio-temporal kriging is an important problem in web and social applications, such as Web or Internet of Things, where things (e.g., sensors) connected into a web often come with spatial and temporal properties. It aims to infer knowledge for (the things at) unobserved locations using the data from (the things at) observed locations during a given time period of interest. This problem essentially requires inductive learning. Once trained, the model should be able to perform kriging for different locations including newly given ones, without retraining. However, it is challenging to perform accurate kriging results because of the heterogeneous spatial relations and diverse temporal patterns. In this paper, we propose a novel inductive graph representation learning model for spatio-temporal kriging. We first encode heterogeneous spatial relations between the unobserved and observed locations by their spatial proximity, functional similarity, and transition probability. Based on each relation, we accurately aggregate the information of most correlated observed locations to produce inductive representations for the unobserved locations, by jointly modeling their similarities and differences. Then, we design relation-aware gated recurrent unit (GRU) networks to adaptively capture the temporal correlations in the generated sequence representations for each relation. Finally, we propose a multi-relation attention mechanism to dynamically fuse the complex spatio-temporal information at different time steps from multiple relations to compute the kriging output. Experimental results on three real-world datasets show that our proposed model outperforms state-of-the-art methods consistently, and the advantage is more significant when there are fewer observed locations. Our code is available at https://github.com/zhengchuanpan/INCREASE.

Dual Intent Enhanced Graph Neural Network for Session-based New Item Recommendation

Recommender systems are essential to various fields, e.g., e-commerce, e-learning, and streaming media. At present, graph neural networks (GNNs) for session-based recommendations normally can only recommend items existing in users’ historical sessions. As a result, these GNNs have difficulty recommending items that users have never interacted with (new items), which leads to a phenomenon of information cocoon. Therefore, it is necessary to recommend new items to users. As there is no interaction between new items and users, we cannot include new items when building session graphs for GNN session-based recommender systems. Thus, it is challenging to recommend new items for users when using GNN-based methods. We regard this challenge as “GNN Session-based New Item Recommendation (GSNIR)”. To solve this problem, we propose a dual-intent enhanced graph neural network for it. Due to the fact that new items are not tied to historical sessions, the users’ intent is difficult to predict. We design a dual-intent network to learn user intent from an attention mechanism and the distribution of historical data respectively, which can simulate users’ decision-making process in interacting with a new item. To solve the challenge that new items cannot be learned by GNNs, inspired by zero-shot learning (ZSL), we infer the new item representation in GNN space by using their attributes. By outputting new item probabilities, which contain recommendation scores of the corresponding items, the new items with higher scores are recommended to users. Experiments on two representative real-world datasets show the superiority of our proposed method. The case study from the real-world verifies interpretability benefits brought by the dual-intent module and the new item reasoning module.

Cut-matching Games for Generalized Hypergraph Ratio Cuts

Many social networks and web-based datasets are characterized by multiway interactions (e.g., groups of co-purchased online retail products or group conversations in Q&A forums) and hypergraph clustering is a fundamental primitive for analyzing these types of interactions. We present an O(log n)-approximation algorithm for a broad class of hypergraph ratio cut objectives. This includes objectives involving generalized hypergraph cut functions, which allow a user to penalize cut hyperedges differently depending on the number of nodes in each cluster. Our method generalizes the cut-matching framework for graph ratio cuts, and relies only on solving maximum s-t flow problems in a special reduced graph. It is significantly faster than existing hypergraph ratio cut algorithms, while also solving a more general problem. In numerical experiments on various web-based hypergraphs, we show that it quickly finds ratio cut solutions within a small factor of optimality.

Toward Degree Bias in Embedding-Based Knowledge Graph Completion

A fundamental task for knowledge graphs (KGs) is knowledge graph completion (KGC). It aims to predict unseen edges by learning representations for all the entities and relations in a KG. A common concern when learning representations on traditional graphs is degree bias. It can affect graph algorithms by learning poor representations for lower-degree nodes, often leading to low performance on such nodes. However, there has been limited research on whether there exists degree bias for embedding-based KGC and how such bias affects the performance of KGC. In this paper, we validate the existence of degree bias in embedding-based KGC and identify the key factor to degree bias. We then introduce a novel data augmentation method, KG-Mixup, to generate synthetic triples to mitigate such bias. Extensive experiments have demonstrated that our method can improve various embedding-based KGC methods and outperform other methods tackling the bias problem on multiple benchmark datasets. 1

Unlearning Graph Classifiers with Limited Data Resources

As the demand for user privacy grows, controlled data removal (machine unlearning) is becoming an important feature of machine learning models for data-sensitive Web applications such as social networks and recommender systems. Nevertheless, at this point it is still largely unknown how to perform efficient machine unlearning of graph neural networks (GNNs); this is especially the case when the number of training samples is small, in which case unlearning can seriously compromise the performance of the model. To address this issue, we initiate the study of unlearning the Graph Scattering Transform (GST), a mathematical framework that is efficient, provably stable under feature or graph topology perturbations, and offers graph classification performance comparable to that of GNNs. Our main contribution is the first known nonlinear approximate graph unlearning method based on GSTs. Our second contribution is a theoretical analysis of the computational complexity of the proposed unlearning mechanism, which is hard to replicate for deep neural networks. Our third contribution are extensive simulation results which show that, compared to complete retraining of GNNs after each removal request, the new GST-based approach offers, on average, a 10.38x speed-up and leads to a 2.6% increase in test accuracy during unlearning of 90 out of 100 training graphs from the IMDB dataset (10% training ratio). Our implementation is available online at https://doi.org/10.5281/zenodo.7613150.

KGTrust: Evaluating Trustworthiness of SIoT via Knowledge Enhanced Graph Neural Networks

Social Internet of Things (SIoT), a promising and emerging paradigm that injects the notion of social networking into smart objects (i.e., things), paving the way for the next generation of Internet of Things. However, due to the risks and uncertainty, a crucial and urgent problem to be settled is establishing reliable relationships within SIoT, that is, trust evaluation. Graph neural networks for trust evaluation typically adopt a straightforward way such as one-hot or node2vec to comprehend node characteristics, which ignores the valuable semantic knowledge attached to nodes. Moreover, the underlying structure of SIoT is usually complex, including both the heterogeneous graph structure and pairwise trust relationships, which renders hard to preserve the properties of SIoT trust during information propagation. To address these aforementioned problems, we propose a novel knowledge-enhanced graph neural network (KGTrust) for better trust evaluation in SIoT. Specifically, we first extract useful knowledge from users’ comment behaviors and external structured triples related to object descriptions, in order to gain a deeper insight into the semantics of users and objects. Furthermore, we introduce a discriminative convolutional layer that utilizes heterogeneous graph structure, node semantics, and augmented trust relationships to learn node embeddings from the perspective of a user as a trustor or a trustee, effectively capturing multi-aspect properties of SIoT trust during information propagation. Finally, a trust prediction layer is developed to estimate the trust relationships between pairwise nodes. Extensive experiments on three public datasets illustrate the superior performance of KGTrust over state-of-the-art methods.

GraphMAE2: A Decoding-Enhanced Masked Self-Supervised Graph Learner

Graph self-supervised learning (SSL), including contrastive and generative approaches, offers great potential to address the fundamental challenge of label scarcity in real-world graph data. Among both sets of graph SSL techniques, the masked graph autoencoders (e.g., GraphMAE)—one type of generative methods—have recently produced promising results. The idea behind this is to reconstruct the node features (or structures)—that are randomly masked from the input—with the autoencoder architecture. However, the performance of masked feature reconstruction naturally relies on the discriminability of the input features and is usually vulnerable to disturbance in the features. In this paper, we present a masked self-supervised learning framework1 GraphMAE2 with the goal of overcoming this issue. The idea is to impose regularization on feature reconstruction for graph SSL. Specifically, we design the strategies of multi-view random re-mask decoding and latent representation prediction to regularize the feature reconstruction. The multi-view random re-mask decoding is to introduce randomness into reconstruction in the feature space, while the latent representation prediction is to enforce the reconstruction in the embedding space. Extensive experiments show that GraphMAE2 can consistently generate top results on various public datasets, including at least 2.45% improvements over state-of-the-art baselines on ogbn-Papers100M with 111M nodes and 1.6B edges.

CogDL: A Comprehensive Library for Graph Deep Learning

Graph neural networks (GNNs) have attracted tremendous attention from the graph learning community in recent years. It has been widely adopted in various real-world applications from diverse domains, such as social networks and biological graphs. The research and applications of graph deep learning present new challenges, including the sparse nature of graph data, complicated training of GNNs, and non-standard evaluation of graph tasks. To tackle the issues, we present CogDL1, a comprehensive library for graph deep learning that allows researchers and practitioners to conduct experiments, compare methods, and build applications with ease and efficiency. In CogDL, we propose a unified design for the training and evaluation of GNN models for various graph tasks, making it unique among existing graph learning libraries. By utilizing this unified trainer, CogDL can optimize the GNN training loop with several training techniques, such as mixed precision training. Moreover, we develop efficient sparse operators for CogDL, enabling it to become the most competitive graph library for efficiency. Another important CogDL feature is its focus on ease of use with the aim of facilitating open and reproducible research of graph learning. We leverage CogDL to report and maintain benchmark results on fundamental graph tasks, which can be reproduced and directly used by the community.

ApeGNN: Node-Wise Adaptive Aggregation in GNNs for Recommendation

In recent years, graph neural networks (GNNs) have made great progress in recommendation. The core mechanism of GNNs-based recommender system is to iteratively aggregate neighboring information on the user-item interaction graph. However, existing GNNs treat users and items equally and cannot distinguish diverse local patterns of each node, which makes them suboptimal in the recommendation scenario. To resolve this challenge, we present a node-wise adaptive graph neural network framework ApeGNN. ApeGNN develops a node-wise adaptive diffusion mechanism for information aggregation, in which each node is enabled to adaptively decide its diffusion weights based on the local structure (e.g., degree). We perform experiments on six widely-used recommendation datasets. The experimental results show that the proposed ApeGNN is superior to the most advanced GNN-based recommender methods (up to 48.94%), demonstrating the effectiveness of node-wise adaptive aggregation.

SESSION: User Modeling and Personalization

Enhancing User Personalization in Conversational Recommenders

Conversational recommenders are emerging as a powerful tool to personalize a user’s recommendation experience. Through a back-and-forth dialogue, users can quickly hone in on just the right items. Many approaches to conversational recommendation, however, only partially explore the user preference space and make limiting assumptions about how user feedback can be best incorporated, resulting in long dialogues and poor recommendation performance. In this paper, we propose a novel conversational recommendation framework with two unique features: (i) a greedy NDCG attribute selector, to enhance user personalization in the interactive preference elicitation process by prioritizing attributes that most effectively represent the actual preference space of the user; and (ii) a user representation refiner, to effectively fuse together the user preferences collected from the interactive elicitation process to obtain a more personalized understanding of the user. Through extensive experiments on four frequently used datasets, we find the proposed framework not only outperforms all the state-of-the-art conversational recommenders (in terms of both recommendation performance and conversation efficiency), but also provides a more personalized experience for the user under the proposed multi-groundtruth multi-round conversational recommendation setting.

LINet: A Location and Intention-Aware Neural Network for Hotel Group Recommendation

Motivated by the collaboration with Fliggy1, a leading Online Travel Platform (OTP), we investigate an important but less explored research topic about optimizing the quality of hotel supply, namely selecting potential profitable hotels in advance to build up adequate room inventory. We formulate a WWW problem, i.e., within a specific time period (When) and potential travel area (Where), which hotels should be recommended to a certain group of users with similar travel intentions (Why). We identify three critical challenges in solving the WWW problem: user groups generation, travel data sparsity and utilization of hotel recommendation information (e.g., period, location and intention). To this end, we propose LINet, a Location and Intention-aware neural Network for hotel group recommendation. Specifically, LINet first identifies user travel intentions for user groups generalization, and then characterizes the group preferences by jointly considering historical user-hotel interaction and spatio-temporal features of hotels. For data sparsity, we develop a graph neural network, which employs long-term data, and further design an auxiliary loss function of location that efficiently exploits data within the same and across different locations. Both offline and online experiments demonstrate the effectiveness of LINet when compared with state-of-the-art methods. LINet has been successfully deployed on Fliggy to retrieve high quality hotels for business development, serving hundreds of hotel operation scenarios and thousands of hotel operators.

Multi-Modal Self-Supervised Learning for Recommendation

The online emergence of multi-modal sharing platforms (e.g., TikTok, Youtube) is powering personalized recommender systems to incorporate various modalities (e.g., visual, textual and acoustic) into the latent user representations. While existing works on multi-modal recommendation exploit multimedia content features in enhancing item embeddings, their model representation capability is limited by heavy label reliance and weak robustness on sparse user behavior data. Inspired by the recent progress of self-supervised learning in alleviating label scarcity issue, we explore deriving self-supervision signals with effectively learning of modality-aware user preference and cross-modal dependencies. To this end, we propose a new Multi-Modal Self-Supervised Learning (MMSSL) method which tackles two key challenges. Specifically, to characterize the inter-dependency between the user-item collaborative view and item multi-modal semantic view, we design a modality-aware interactive structure learning paradigm via adversarial perturbations for data augmentation. In addition, to capture the effects that user’s modality-aware interaction pattern would interweave with each other, a cross-modal contrastive learning approach is introduced to jointly preserve the inter-modal semantic commonality and user preference diversity. Experiments on real-world datasets verify the superiority of our method in offering great potential for multimedia recommendation over various state-of-the-art baselines. The implementation is released at: https://github.com/HKUDS/MMSSL.

Distillation from Heterogeneous Models for Top-K Recommendation

Recent recommender systems have shown remarkable performance by using an ensemble of heterogeneous models. However, it is exceedingly costly because it requires resources and inference latency proportional to the number of models, which remains the bottleneck for production. Our work aims to transfer the ensemble knowledge of heterogeneous teachers to a lightweight student model using knowledge distillation (KD), to reduce the huge inference costs while retaining high accuracy. Through an empirical study, we find that the efficacy of distillation severely drops when transferring knowledge from heterogeneous teachers. Nevertheless, we show that an important signal to ease the difficulty can be obtained from the teacher’s training trajectory. This paper proposes a new KD framework, named HetComp, that guides the student model by transferring easy-to-hard sequences of knowledge generated from the teachers’ trajectories. To provide guidance according to the student’s learning state, HetComp uses dynamic knowledge construction to provide progressively difficult ranking knowledge and adaptive knowledge transfer to gradually transfer finer-grained ranking information. Our comprehensive experiments show that HetComp significantly improves the distillation quality and the generalization of the student model.

On the Theories Behind Hard Negative Sampling for Recommendation

Negative sampling has been heavily used to train recommender models on large-scale data, wherein sampling hard examples usually not only accelerates the convergence but also improves the model accuracy. Nevertheless, the reasons for the effectiveness of Hard Negative Sampling (HNS) have not been revealed yet. In this work, we fill the research gap by conducting thorough theoretical analyses on HNS. Firstly, we prove that employing HNS on the Bayesian Personalized Ranking (BPR) learner is equivalent to optimizing One-way Partial AUC (OPAUC). Concretely, the BPR equipped with Dynamic Negative Sampling (DNS) is an exact estimator, while with softmax-based sampling is a soft estimator. Secondly, we prove that OPAUC has a stronger connection with Top-K evaluation metrics than AUC and verify it with simulation experiments. These analyses establish the theoretical foundation of HNS in optimizing Top-K recommendation performance for the first time. On these bases, we offer two insightful guidelines for effective usage of HNS: 1) the sampling hardness should be controllable, e.g., via pre-defined hyper-parameters, to adapt to different Top-K metrics and datasets; 2) the smaller the K we emphasize in Top-K evaluation metrics, the harder the negative samples we should draw. Extensive experiments on three real-world benchmarks verify the two guidelines.

Fine-tuning Partition-aware Item Similarities for Efficient and Scalable Recommendation

Collaborative filtering (CF) is widely searched in recommendation with various types of solutions. Recent success of Graph Convolution Networks (GCN) in CF demonstrates the effectiveness of modeling high-order relationships through graphs, while repetitive graph convolution and iterative batch optimization limit their efficiency. Instead, item similarity models attempt to construct direct relationships through efficient interaction encoding. Despite their great performance, the growing item numbers result in quadratic growth in similarity modeling process, posing critical scalability problems. In this paper, we investigate the graph sampling strategy adopted in latest GCN model for efficiency improving, and identify the potential item group structure in the sampled graph. Based on this, we propose a novel item similarity model which introduces graph partitioning to restrict the item similarity modeling within each partition. Specifically, we show that the spectral information of the original graph is well in preserving global-level information. Then, it is added to fine-tune local item similarities with a new data augmentation strategy acted as partition-aware prior knowledge, jointly to cope with the information loss brought by partitioning. Experiments carried out on 4 datasets show that the proposed model outperforms state-of-the-art GCN models with 10x speed-up and item similarity models with 95% parameter storage savings.

Exploration and Regularization of the Latent Action Space in Recommendation

In recommender systems, reinforcement learning solutions have effectively boosted recommendation performance because of their ability to capture long-term user-system interaction. However, the action space of the recommendation policy is a list of items, which could be extremely large with a dynamic candidate item pool. To overcome this challenge, we propose a hyper-actor and critic learning framework where the policy decomposes the item list generation process into a hyper-action inference step and an effect-action selection step. The first step maps the given state space into a vectorized hyper-action space, and the second step selects the item list based on the hyper-action. In order to regulate the discrepancy between the two action spaces, we design an alignment module along with a kernel mapping function for items to ensure inference accuracy and include a supervision module to stabilize the learning process. We build simulated environments on public datasets and empirically show that our framework is superior in recommendation compared to standard RL baselines.

Bootstrap Latent Representations for Multi-modal Recommendation

This paper studies the multi-modal recommendation problem, where the item multi-modality information (e.g., images and textual descriptions) is exploited to improve the recommendation accuracy. Besides the user-item interaction graph, existing state-of-the-art methods usually use auxiliary graphs (e.g., user-user or item-item relation graph) to augment the learned representations of users and/or items. These representations are often propagated and aggregated on auxiliary graphs using graph convolutional networks, which can be prohibitively expensive in computation and memory, especially for large graphs. Moreover, existing multi-modal recommendation methods usually leverage randomly sampled negative examples in Bayesian Personalized Ranking (BPR) loss to guide the learning of user/item representations, which increases the computational cost on large graphs and may also bring noisy supervision signals into the training process. To tackle the above issues, we propose a novel self-supervised multi-modal recommendation model, dubbed BM3, which requires neither augmentations from auxiliary graphs nor negative samples. Specifically, BM3 first bootstraps latent contrastive views from the representations of users and items with a simple dropout augmentation. It then jointly optimizes three multi-modal objectives to learn the representations of users and items by reconstructing the user-item interaction graph and aligning modality features under both inter- and intra-modality perspectives. BM3 alleviates both the need for contrasting with negative examples and the complex graph augmentation from an additional target network for contrastive view generation. We show BM3 outperforms prior recommendation models on three datasets with number of nodes ranging from 20K to 200K, while achieving a 2-9 × reduction in training time. Code implementation is located at: https://github.com/enoche/BM3.

Tracing Knowledge Instead of Patterns: Stable Knowledge Tracing with Diagnostic Transformer

Knowledge Tracing (KT) aims at tracing the evolution of the knowledge states along the learning process of a learner. It has become a crucial task for online learning systems to model the learning process of their users, and further provide their users a personalized learning guidance. However, recent developments in KT based on deep neural networks mostly focus on increasing the accuracy of predicting the next performance of students. We argue that current KT modeling, as well as training paradigm, can lead to models tracing patterns of learner’s learning activities, instead of their evolving knowledge states. In this paper, we propose a new architecture, Diagnostic Transformer (DTransformer), along with a new training paradigm, to tackle this challenge. With DTransformer, we build the architecture from question-level to knowledge-level, explicitly diagnosing learner’s knowledge proficiency from each question mastery states. We also propose a novel training algorithm based on contrastive learning that focuses on maintaining the stability of the knowledge state diagnosis. Through extensive experiments, we will show that with its understanding of knowledge state evolution, DTransformer achieves a better performance prediction accuracy and more stable knowledge state tracing results. We will also show that DTransformer is less sensitive to specific patterns with case study. We open-sourced our code and data at https://github.com/yxonic/DTransformer.

Two-Stage Constrained Actor-Critic for Short Video Recommendation

The wide popularity of short videos on social media poses new opportunities and challenges to optimize recommender systems on the video-sharing platforms. Users sequentially interact with the system and provide complex and multi-faceted responses, including WatchTime  and various types of interactions with multiple videos. On the one hand, the platforms aim at optimizing the users’ cumulative WatchTime  (main goal) in the long term, which can be effectively optimized by Reinforcement Learning. On the other hand, the platforms also need to satisfy the constraint of accommodating the responses of multiple user interactions (auxiliary goals) such as Like, Follow, Share, etc. In this paper, we formulate the problem of short video recommendation as a Constrained Markov Decision Process (CMDP). We find that traditional constrained reinforcement learning algorithms fail to work well in this setting. We propose a novel two-stage constrained actor-critic method: At stage one, we learn individual policies to optimize each auxiliary signal. In stage two, we learn a policy to (i) optimize the main signal and (ii) stay close to policies learned in the first stage, which effectively guarantees the performance of this main policy on the auxiliaries. Through extensive offline evaluations, we demonstrate the effectiveness of our method over alternatives in both optimizing the main goal as well as balancing the others. We further show the advantage of our method in live experiments of short video recommendations, where it significantly outperforms other baselines in terms of both WatchTime  and interactions. Our approach has been fully launched in the production system to optimize user experiences on the platform.

Recommendation with Causality enhanced Natural Language Explanations

Explainable recommendation has recently attracted increasing attention from both academic and industry communities. Among different explainable strategies, generating natural language explanations is an important method, which can deliver more informative, flexible and readable explanations to facilitate better user decisions. Despite the effectiveness, existing models are mostly optimized based on the observed datasets, which can be skewed due to the selection or exposure bias. To alleviate this problem, in this paper, we formulate the task of explainable recommendation with a causal graph, and design a causality enhanced framework to generate unbiased explanations. More specifically, we firstly define an ideal unbiased learning objective, and then derive a tractable loss for the observational data based on the inverse propensity score (IPS), where the key is a sample re-weighting strategy for equalizing the loss and ideal objective in expectation. Considering that the IPS estimated from the sparse and noisy recommendation datasets can be inaccurate, we introduce a fault tolerant mechanism by minimizing the maximum loss induced by the sample weights near the IPS. For more comprehensive modeling, we further analyze and infer the potential latent confounders induced by the complex and diverse user personalities. We conduct extensive experiments by comparing with the state-of-the-art methods based on three real-world datasets to demonstrate the effectiveness of our method.

Cross-domain recommendation via user interest alignment

Cross-domain recommendation aims to leverage knowledge from multiple domains to alleviate the data sparsity and cold-start problems in traditional recommender systems. One popular paradigm is to employ overlapping user representations to establish domain connections, thereby improving recommendation performance in all scenarios. Nevertheless, the general practice of this approach is to train user embeddings in each domain separately and then aggregate them in a plain manner, often ignoring potential cross-domain similarities between users and items. Furthermore, considering that their training objective is recommendation task-oriented without specific regularizations, the optimized embeddings disregard the interest alignment among user’s views, and even violate the user’s original interest distribution. To address these challenges, we propose a novel cross-domain recommendation framework, namely COAST, to improve recommendation performance on dual domains by perceiving the cross-domain similarity between entities and aligning user interests. Specifically, we first construct a unified cross-domain heterogeneous graph and redefine the message passing mechanism of graph convolutional networks to capture high-order similarity of users and items across domains. Targeted at user interest alignment, we develop deep insights from two more fine-grained perspectives of user-user and user-item interest invariance across domains by virtue of affluent unsupervised and semantic signals. We conduct intensive experiments on multiple tasks, constructed from two large recommendation data sets. Extensive results show COAST consistently and significantly outperforms state-of-the-art cross-domain recommendation algorithms as well as classic single-domain recommendation methods.

Robust Recommendation with Adversarial Gaussian Data Augmentation

Recommender system holds the promise of accurately understanding and estimating the user preferences. However, due to the extremely sparse user-item interactions, the learned recommender models can be less robust and sensitive to the highly dynamic user preferences and easily changed recommendation environments. To alleviate this problem, in this paper, we propose a simple yet effective robust recommender framework by generating additional samples from the Gaussian distributions. In specific, we design two types of data augmentation strategies. For the first one, we directly produce the data based on the original samples, where we simulate the generation process in the latent space. For the second one, we firstly change the original samples towards the direction of maximizing the loss function, and then produce the data based on the altered samples to make more effective explorations. Based on both of the above strategies, we leverage adversarial training to optimize the recommender model with the generated data which can achieve the largest losses. In addition, we theoretically analyze our framework, and find that the above two data augmentation strategies equal to impose a gradient based regularization on the original recommender models. We conduct extensive experiments based on six real-world datasets to demonstrate the effectiveness of our framework.

Learning to Simulate Daily Activities via Modeling Dynamic Human Needs

Daily activity data that records individuals’ various types of activities in daily life are widely used in many applications such as activity scheduling, activity recommendation, and policymaking. Though with high value, its accessibility is limited due to high collection costs and potential privacy issues. Therefore, simulating human activities to produce massive high-quality data is of great importance to benefit practical applications. However, existing solutions, including rule-based methods with simplified assumptions of human behavior and data-driven methods directly fitting real-world data, both cannot fully qualify for matching reality. In this paper, motivated by the classic psychological theory, Maslow’s need theory describing human motivation, we propose a knowledge-driven simulation framework based on generative adversarial imitation learning. To enhance the fidelity and utility of the generated activity data, our core idea is to model the evolution of human needs as the underlying mechanism that drives activity generation in the simulation model. Specifically, this is achieved by a hierarchical model structure that disentangles different need levels, and the use of neural stochastic differential equations that successfully captures piecewise-continuous characteristics of need dynamics. Extensive experiments demonstrate that our framework outperforms the state-of-the-art baselines in terms of data fidelity and utility. Besides, we present the insightful interpretability of the need modeling. The code is available at https://github.com/tsinghua-fib-lab/Activity-Simulation-SAND.

Dual-interest Factorization-heads Attention for Sequential Recommendation

Accurate user interest modeling is vital for recommendation scenarios. One of the effective solutions is the sequential recommendation that relies on click behaviors, but this is not elegant in the video feed recommendation where users are passive in receiving the streaming contents and return skip or no-skip behaviors. Here skip and no-skip behaviors can be treated as negative and positive feedback, respectively. With the mixture of positive and negative feedback, it is challenging to capture the transition pattern of behavioral sequence. To do so, FeedRec has exploited a shared vanilla Transformer, which may be inelegant because head interaction of multi-heads attention does not consider different types of feedback. In this paper, we propose Dual-interest Factorization-heads Attention for Sequential Recommendation (short for DFAR) consisting of feedback-aware encoding layer, dual-interest disentangling layer and prediction layer. In the feedback-aware encoding layer, we first suppose each head of multi-heads attention can capture specific feedback relations. Then we further propose factorization-heads attention which can mask specific head interaction and inject feedback information so as to factorize the relation between different types of feedback. Additionally, we propose a dual-interest disentangling layer to decouple positive and negative interests before performing disentanglement on their representations. Finally, we evolve the positive and negative interests by corresponding towers whose outputs are contrastive by BPR loss. Experiments on two real-world datasets show the superiority of our proposed method against state-of-the-art baselines. Further ablation study and visualization also sustain its effectiveness. We release the source code here: https://github.com/tsinghua-fib-lab/WWW2023-DFAR.

Contrastive Collaborative Filtering for Cold-Start Item Recommendation

The cold-start problem is a long-standing challenge in recommender systems. As a promising solution, content-based generative models usually project a cold-start item’s content onto a warm-start item embedding to capture collaborative signals from item content so that collaborative filtering can be applied. However, since the training of the cold-start recommendation models is conducted on warm datasets, the existent methods face the issue that the collaborative embeddings of items will be blurred, which significantly degenerates the performance of cold-start item recommendation. To address this issue, we propose a novel model called Contrastive Collaborative Filtering for Cold-start item Recommendation (CCFCRec), which capitalizes on the co-occurrence collaborative signals in warm training data to alleviate the issue of blurry collaborative embeddings for cold-start item recommendation. In particular, we devise a contrastive collaborative filtering (CF) framework, consisting of a content CF module and a co-occurrence CF module to generate the content-based collaborative embedding and the co-occurrence collaborative embedding for a training item, respectively. During the joint training of the two CF modules, we apply a contrastive learning between the two collaborative embeddings, by which the knowledge about the co-occurrence signals can be indirectly transferred to the content CF module, so that the blurry collaborative embeddings can be rectified implicitly by the memorized co-occurrence collaborative signals during the applying phase. Together with the sound theoretical analysis, the extensive experiments conducted on real datasets demonstrate the superiority of the proposed model. The codes and datasets are available on https://github.com/zzhin/CCFCRec.

Anti-FakeU: Defending Shilling Attacks on Graph Neural Network based Recommender Model

Graph neural network (GNN) based recommendation models are observed to be more vulnerable against carefully-designed malicious records injected into the system, i.e., shilling attacks, which manipulate the recommendation to common users and therefore impair user trust. In this paper, we for the first time conduct a systematic study on the vulnerability of GNN based recommendation model against the shilling attack. With the aid of theoretical analysis, we attribute the root cause of the vulnerability to its neighborhood aggregation mechanism, which could make the negative impact of attacks propagate rapidly in the system. To restore the robustness of GNN based recommendation model, the key factor lies in detecting malicious records in the system and preventing the propagation of misinformation. To this end, we construct a user-user graph to capture the patterns of malicious behaviors and design a novel GNN based detector to identify fake users. Furthermore, we develop a data augmentation strategy and a joint learning paradigm to train the recommender model and the proposed detector. Extensive experiments on benchmark datasets validate the enhanced robustness of the proposed method in resisting various types of shilling attacks and identifying fake users, e.g., our proposed method fully mitigating the impact of popularity attacks on target items up to , and improving the accuracy of detecting fake users on the Gowalla dataset by .

Controllable Universal Fair Representation Learning

Learning fair and transferable representations of users that can be used for a wide spectrum of downstream tasks (specifically, machine learning models) has great potential in fairness-aware Web services. Existing studies focus on debiasing w.r.t. a small scale of (one or a handful of) fixed pre-defined sensitive attributes. However, in real practice, downstream data users can be interested in various protected groups and these are usually not known as prior. This requires the learned representations to be fair w.r.t. all possible sensitive attributes. We name this task universal fair representation learning, in which an exponential number of sensitive attributes need to be dealt with, bringing the challenges of unreasonable computational cost and un-guaranteed fairness constraints. To address these problems, we propose a controllable universal fair representation learning (CUFRL) method. An effective bound is first derived via the lens of mutual information to guarantee parity of the universal set of sensitive attributes while maintaining the accuracy of downstream tasks. We also theoretically establish that the number of sensitive attributes that need to be processed can be reduced from exponential to linear. Experiments on two public real-world datasets demonstrate CUFRL can achieve significantly better accuracy-fairness trade-off compared with baseline approaches.

Compressed Interaction Graph based Framework for Multi-behavior Recommendation

Multi-types of user behavior data (e.g., clicking, adding to cart, and purchasing) are recorded in most real-world recommendation scenarios, which can help to learn users’ multi-faceted preferences. However, it is challenging to explore multi-behavior data due to the unbalanced data distribution and sparse target behavior, which lead to the inadequate modeling of high-order relations when treating multi-behavior data “as features” and gradient conflict in multi-task learning when treating multi-behavior data “as labels”. In this paper, we propose CIGF, a Compressed Interaction Graph based Framework, to overcome the above limitations. Specifically, we design a novel Compressed Interaction Graph Convolution Network (CIGCN) to model instance-level high-order relations explicitly. To alleviate the potential gradient conflict when treating multi-behavior data “as labels”, we propose a Multi-Expert with Separate Input (MESI) network with separate input on the top of CIGCN for multi-task learning. Comprehensive experiments on three large-scale real-world datasets demonstrate the superiority of CIGF.

A Counterfactual Collaborative Session-based Recommender System

Most session-based recommender systems (SBRSs) focus on extracting information from the observed items in the current session of a user to predict a next item, ignoring the causes outside the session (called outer-session causes, OSCs) that influence the user’s selection of items. However, these causes widely exist in the real world, and few studies have investigated their role in SBRSs. In this work, we analyze the causalities and correlations of the OSCs in SBRSs from the perspective of causal inference. We find that the OSCs are essentially the confounders in SBRSs, which leads to spurious correlations in the data used to train SBRS models. To address this problem, we propose a novel SBRS framework named COCO-SBRS (COunterfactual COllaborative Session-Based Recommender Systems) to learn the causality between OSCs and user-item interactions in SBRSs. COCO-SBRS first adopts a self-supervised approach to pre-train a recommendation model by designing pseudo-labels of causes for each user’s selection of the item in data to guide the training process. Next, COCO-SBRS adopts counterfactual inference to recommend items based on the outputs of the pre-trained recommendation model considering the causalities to alleviate the data sparsity problem. As a result, COCO-SBRS can learn the causalities in data, preventing the model from learning spurious correlations. The experimental results of our extensive experiments conducted on three real-world datasets demonstrate the superiority of our proposed framework over ten representative SBRSs.

Correlative Preference Transfer with Hierarchical Hypergraph Network for Multi-Domain Recommendation

Advanced recommender systems usually involve multiple domains (such as scenarios or categories) for various marketing strategies, and users interact with them to satisfy diverse demands. The goal of multi-domain recommendation (MDR) is to improve the recommendation performance of all domains simultaneously. Conventional graph neural network based methods usually deal with each domain separately, or train a shared model to serve all domains. The former fails to leverage users’ cross-domain behaviors, making the behavior sparseness issue a great obstacle. The latter learns shared user representation with respect to all domains, which neglects users’ domain-specific preferences. In this paper we propose , a hierarchical hypergraph network based correlative preference transfer framework for MDR, which represents multi-domain user-item interactions into a unified graph to help preference transfer. incorporates two hyperedge-based modules, namely dynamic item transfer (Hyper-I) and adaptive user aggregation (Hyper-U). Hyper-I extracts correlative information from multi-domain user-item feedbacks for eliminating domain discrepancy of item representations. Hyper-U aggregates users’ scattered preferences in multiple domains and further exploits the high-order (not only pair-wise) connections to improve user representations. Experiments on both public and production datasets verify the superiority of for MDR.

Automated Self-Supervised Learning for Recommendation

Graph neural networks (GNNs) have emerged as the state-of-the-art paradigm for collaborative filtering (CF). To improve the representation quality over limited labeled data, contrastive learning has attracted attention in recommendation and benefited graph-based CF model recently. However, the success of most contrastive methods heavily relies on manually generating effective contrastive views for heuristic-based data augmentation. This does not generalize across different datasets and downstream recommendation tasks, which is difficult to be adaptive for data augmentation and robust to noise perturbation. To fill this crucial gap, this work proposes a unified Automated Collaborative Filtering (AutoCF) to automatically perform data augmentation for recommendation. Specifically, we focus on the generative self-supervised learning framework with a learnable augmentation paradigm that benefits the automated distillation of important self-supervised signals. To enhance the representation discrimination ability, our masked graph autoencoder is designed to aggregate global information during the augmentation via reconstructing the masked subgraph structures. Experiments and ablation studies are performed on several public datasets for recommending products, venues, and locations. Results demonstrate the superiority of AutoCF against various baseline methods. We release the model implementation at https://github.com/HKUDS/AutoCF.

AutoDenoise: Automatic Data Instance Denoising for Recommendations

Historical user-item interaction datasets are essential in training modern recommender systems for predicting user preferences. However, the arbitrary user behaviors in most recommendation scenarios lead to a large volume of noisy data instances being recorded, which cannot fully represent their true interests. While a large number of denoising studies are emerging in the recommender system community, all of them suffer from highly dynamic data distributions. In this paper, we propose a Deep Reinforcement Learning (DRL) based framework, AutoDenoise, with an Instance Denoising Policy Network, for denoising data instances with an instance selection manner in deep recommender systems. To be specific, AutoDenoise serves as an agent in DRL to adaptively select noise-free and predictive data instances, which can then be utilized directly in training representative recommendation models. In addition, we design an alternate two-phase optimization strategy to train and validate the AutoDenoise properly. In the searching phase, we aim to train the policy network with the capacity of instance denoising; in the validation phase, we find out and evaluate the denoised subset of data instances selected by the trained policy network, so as to validate its denoising ability. We conduct extensive experiments to validate the effectiveness of AutoDenoise combined with multiple representative recommender system models.

Improving Recommendation Fairness via Data Augmentation

Collaborative filtering based recommendation learns users’ preferences from all users’ historical behavior data, and has been popular to facilitate decision making. Recently, the fairness issue of recommendation has become more and more essential. A recommender system is considered unfair when it does not perform equally well for different user groups according to users’ sensitive attributes (e.g., gender, race). Plenty of methods have been proposed to alleviate unfairness by optimizing a predefined fairness goal or changing the distribution of unbalanced training data. However, they either suffered from the specific fairness optimization metrics or relied on redesigning the current recommendation architecture. In this paper, we study how to improve recommendation fairness from the data augmentation perspective. The recommendation model amplifies the inherent unfairness of imbalanced training data. We augment imbalanced training data towards balanced data distribution to improve fairness. Given each real original user-item interaction record, we propose the following hypotheses for augmenting the training data: each user in one group has a similar item preference (click or non-click) as the item preference of any user in the remaining group. With these hypotheses, we generate “fake" interaction behaviors to complement the original training data. After that, we design a bi-level optimization target, with the inner optimization generates better fake data to augment training data with our hypotheses, and the outer one updates the recommendation model parameters based on the augmented training data. The proposed framework is generally applicable to any embedding-based recommendation, and does not need to pre-define a fairness metric. Extensive experiments on two real-world datasets clearly demonstrate the superiority of our proposed framework. We publish the source code at https://github.com/newlei/FDA.

ColdNAS: Search to Modulate for User Cold-Start Recommendation

Making personalized recommendation for cold-start users, who only have a few interaction histories, is a challenging problem in recommendation systems. Recent works leverage hypernetworks to directly map user interaction histories to user-specific parameters, which are then used to modulate predictor by feature-wise linear modulation function. These works obtain the state-of-the-art performance. However, the physical meaning of scaling and shifting in recommendation data is unclear. Instead of using a fixed modulation function and deciding modulation position by expertise, we propose a modulation framework called ColdNAS for user cold-start problem, where we look for proper modulation structure, including function and position, via neural architecture search. We design a search space which covers broad models and theoretically prove that this search space can be transformed to a much smaller space, enabling an efficient and robust one-shot search algorithm. Extensive experimental results on benchmark datasets show that ColdNAS consistently performs the best. We observe that different modulation functions lead to the best performance on different datasets, which validates the necessity of designing a searching-based method. Codes are available at https://github.com/LARS-research/ColdNAS.

AutoS2AE: Automate to Regularize Sparse Shallow Autoencoders for Recommendation

The Embarrassingly Shallow Autoencoders (EASE and SLIM) are strong recommendation methods based on implicit feedback, compared to competing methods like iALS and VAE-CF. However, EASE suffers from several major shortcomings. First, the training and inference of EASE can not scale with the increasing number of items since it requires storing and inverting a large dense matrix; Second, though its optimization objective – the square loss– can yield a closed-form solution, it is not consistent with recommendation goal – predicting a personalized ranking on a set of items, so that its performance is far from optimal w.r.t ranking-oriented recommendation metrics. Finally, the regularization coefficients are sensitive w.r.t recommendation accuracy and vary a lot across different datasets, so the fine-tuning of these parameters is important yet time-consuming. To improve training and inference efficiency, we propose a Similarity-Structure Aware Shallow Autoencoder on top of three similarity structures, including Co-Occurrence, KNN and NSW. We then optimize the model with a weighted square loss, which is proven effective for ranking-based recommendation but still capable of deriving closed-form solutions. However, the weight in the loss can not be learned in the training set and is similarly sensitive w.r.t the accuracy to regularization coefficients. To automatically tune the hyperparameters, we design two validation losses on the validation set for guidance, and update the hyperparameters with the gradient of the validation losses. We finally evaluate the proposed method on multiple real-world datasets and show that it outperforms seven competing baselines remarkably, and verify the effectiveness of each part in the proposed method.

Quantize Sequential Recommenders Without Private Data

Deep neural networks have achieved great success in sequential recommendation systems. While maintaining high competence in user modeling and next-item recommendation, these models have long been plagued by the numerous parameters and computation, which inhibit them to be deployed on resource-constrained mobile devices. Model quantization, as one of the main paradigms for compression techniques, converts float parameters to low-bit values to reduce parameter redundancy and accelerate inference. To avoid drastic performance degradation, it usually requests a fine-tuning phase with an original dataset. However, the training set of user-item interactions is not always available due to transmission limits or privacy concerns. In this paper, we propose a novel framework to quantize sequential recommenders without access to any real private data. A generator is employed in the framework to synthesize fake sequence samples to feed the quantized sequential recommendation model and minimize the gap with a full-precision sequential recommendation model. The generator and the quantized model are optimized with a min-max game — alternating discrepancy estimation and knowledge transfer. Moreover, we devise a two-level discrepancy modeling strategy to transfer information between the quantized model and the full-precision model. The extensive experiments of various recommendation networks on three public datasets demonstrate the effectiveness of the proposed framework.

Interaction-level Membership Inference Attack Against Federated Recommender Systems

The marriage of federated learning and recommender system (FedRec) has been widely used to address the growing data privacy concerns in personalized recommendation services. In FedRecs, users’ attribute information and behavior data (i.e., user-item interaction data) are kept locally on their personal devices, therefore, it is considered a fairly secure approach to protect user privacy. As a result, the privacy issue of FedRecs is rarely explored. Unfortunately, several recent studies reveal that FedRecs are vulnerable to user attribute inference attacks, highlighting the privacy concerns of FedRecs. In this paper, we further investigate the privacy problem of user behavior data (i.e., user-item interactions) in FedRecs. Specifically, we perform the first systematic study on interaction-level membership inference attacks on FedRecs. An interaction-level membership inference attacker is first designed, and then the classical privacy protection mechanism, Local Differential Privacy (LDP), is adopted to defend against the membership inference attack. Unfortunately, the empirical analysis shows that LDP is not effective against such new attacks unless the recommendation performance is largely compromised. To mitigate the interaction-level membership attack threats, we design a simple yet effective defense method to significantly reduce the attacker’s inference accuracy without losing recommendation performance. Extensive experiments are conducted with two widely used FedRecs (Fed-NCF and Fed-LightGCN) on three real-world recommendation datasets (MovieLens-100K, Steam-200K, and Amazon Cell Phone), and the experimental results show the effectiveness of our solutions.

Debiased Contrastive Learning for Sequential Recommendation

Current sequential recommender systems are proposed to tackle the dynamic user preference learning with various neural techniques, such as Transformer and Graph Neural Networks (GNNs). However, inference from the highly sparse user behavior data may hinder the representation ability of sequential pattern encoding. To address the label shortage issue, contrastive learning (CL) methods are proposed recently to perform data augmentation in two fashions: (i) randomly corrupting the sequence data (e.g., stochastic masking, reordering); (ii) aligning representations across pre-defined contrastive views. Although effective, we argue that current CL-based methods have limitations in addressing popularity bias and disentangling of user conformity and real interest. In this paper, we propose a new Debiased Contrastive learning paradigm for Recommendation (DCRec) that unifies sequential pattern encoding with global collaborative relation modeling through adaptive conformity-aware augmentation. This solution is designed to tackle the popularity bias issue in recommendation systems. Our debiased contrastive learning framework effectively captures both the patterns of item transitions within sequences and the dependencies between users across sequences. Our experiments on various real-world datasets have demonstrated that DCRec significantly outperforms state-of-the-art baselines, indicating its efficacy for recommendation. To facilitate reproducibility of our results, we make our implementation of DCRec publicly available at: https://github.com/HKUDS/DCRec.

Clustered Embedding Learning for Recommender Systems

In recent years, recommender systems have advanced rapidly, where embedding learning for users and items plays a critical role. A standard method learns a unique embedding vector for each user and item. However, such a method has two important limitations in real-world applications: 1) it is hard to learn embeddings that generalize well for users and items with rare interactions; and 2) it may incur unbearably high memory costs when the number of users and items scales up. Existing approaches either can only address one of the limitations or have flawed overall performances. In this paper, we propose Clustered Embedding Learning (CEL) as an integrated solution to these two problems. CEL is a plug-and-play embedding learning framework that can be combined with any differentiable feature interaction model. It is capable of achieving improved performance, especially for cold users and items, with reduced memory cost. CEL enables automatic and dynamic clustering of users and items in a top-down fashion, where clustered entities jointly learn a shared embedding. The accelerated version of CEL has an optimal time complexity, which supports efficient online updates. Theoretically, we prove the identifiability and the existence of a unique optimal number of clusters for CEL in the context of nonnegative matrix factorization. Empirically, we validate the effectiveness of CEL on three public datasets and one business dataset, showing its consistently superior performance against current state-of-the-art methods. In particular, when incorporating CEL into the business model, it brings an improvement of in AUC, which translates into a significant revenue gain; meanwhile, the size of the embedding table gets 2650 times smaller.1

Adap-τ : Adaptively Modulating Embedding Magnitude for Recommendation

Recent years have witnessed the great successes of embedding-based methods in recommender systems. Despite their decent performance, we argue one potential limitation of these methods — the embedding magnitude has not been explicitly modulated, which may aggravate popularity bias and training instability, hindering the model from making a good recommendation. It motivates us to leverage the embedding normalization in recommendation. By normalizing user/item embeddings to a specific value, we empirically observe impressive performance gains (9% on average) on four real-world datasets. Although encouraging, we also reveal a serious limitation when applying normalization in recommendation — the performance is highly sensitive to the choice of the temperature τ which controls the scale of the normalized embeddings.

To fully foster the merits of the normalization while circumvent its limitation, this work studied on how to adaptively set the proper τ. Towards this end, we first make a comprehensive analyses of τ to fully understand its role on recommendation. We then accordingly develop an adaptive fine-grained strategy Adap-τ for the temperature with satisfying four desirable properties including adaptivity, personalized, efficiency and model-agnostic. Extensive experiments have been conducted to validate the effectiveness of the proposal. The code is available at https://github.com/junkangwu/Adap_tau.

Robust Preference-Guided Denoising for Graph based Social Recommendation

Graph Neural Network (GNN) based social recommendation models improve the prediction accuracy of user preference by leveraging GNN in exploiting preference similarity contained in social relations. However, in terms of both effectiveness and efficiency of recommendation, a large portion of social relations can be redundant or even noisy, e.g., it is quite normal that friends share no preference in a certain domain. Existing models do not fully solve this problem of relation redundancy and noise, as they directly characterize social influence over the full social network. In this paper, we instead propose to improve graph based social recommendation by only retaining the informative social relations to ensure an efficient and effective influence diffusion, i.e., graph denoising. Our designed denoising method is preference-guided to model social relation confidence and benefits user preference learning in return by providing a denoised but more informative social graph for recommendation models. Moreover, to avoid interference of noisy social relations, it designs a self-correcting curriculum learning module and an adaptive denoising strategy, both favoring highly-confident samples. Experimental results on three public datasets demonstrate its consistent capability of improving three state-of-the-art social recommendation models by robustly removing 10-40% of original relations. We release the source code at https://github.com/tsinghua-fib-lab/Graph-Denoising-SocialRec.

MMMLP: Multi-modal Multilayer Perceptron for Sequential Recommendations

Sequential recommendation aims to offer potentially interesting products to users by capturing their historical sequence of interacted items. Although it has facilitated extensive physical scenarios, sequential recommendation for multi-modal sequences has long been neglected. Multi-modal data that depicts a user’s historical interactions exists ubiquitously, such as product pictures, textual descriptions, and interacted item sequences, providing semantic information from multiple perspectives that comprehensively describe a user’s preferences. However, existing sequential recommendation methods either fail to directly handle multi-modality or suffer from high computational complexity. To address this, we propose a novel Multi-Modal Multi-Layer Perceptron (MMMLP) for maintaining multi-modal sequences for sequential recommendation. MMMLP is a purely MLP-based architecture that consists of three modules - the Feature Mixer Layer, Fusion Mixer Layer, and Prediction Layer - and has an edge on both efficacy and efficiency. Extensive experiments show that MMMLP achieves state-of-the-art performance with linear complexity. We also conduct ablating analysis to verify the contribution of each component. Furthermore, compatible experiments are devised, and the results show that the multi-modal representation learned by our proposed model generally benefits other recommendation models, emphasizing our model’s ability to handle multi-modal information. We have made our code available online to ease reproducibility1.

Response-act Guided Reinforced Dialogue Generation for Mental Health Counseling

Virtual Mental Health Assistants (VMHAs) have become a prevalent method for receiving mental health counseling in the digital healthcare space. An assistive counseling conversation commences with natural open-ended topics to familiarize the client with the environment and later converges into more fine-grained domain-specific topics. Unlike other conversational systems, which are categorized as open-domain or task-oriented systems, VMHAs possess a hybrid conversational flow. These counseling bots need to comprehend various aspects of the conversation, such as dialogue-acts, intents, etc., to engage the client in an effective and appropriate conversation. Although the surge in digital health research highlights applications of many general-purpose response generation systems, they are barely suitable in the mental health domain – the prime reason is the lack of understanding in the mental health counseling conversation. Moreover, in general, dialogue-act guided response generators are either limited to a template-based paradigm or lack appropriate semantics in dialogue generation. To this end, we propose READER – a REsponse-Act guided reinforced Dialogue genERation model for the mental health counseling conversations. READER is built on transformer to jointly predict a potential dialogue-act dt + 1 for the next utterance (aka response-act) and to generate an appropriate response (ut + 1). Through the transformer-reinforcement-learning (TRL) with Proximal Policy Optimization (PPO), we guide the response generator to abide by dt + 1 and ensure the semantic richness of the responses via BERTScore in our reward computation. We evaluate READER on HOPE, a benchmark counseling conversation dataset and observe that it outperforms several baselines across several evaluation metrics – METEOR, ROUGE, and BERTScore.

Few-shot News Recommendation via Cross-lingual Transfer

The cold-start problem has been commonly recognized in recommendation systems and studied by following a general idea to leverage the abundant interaction records of warm users to infer the preference of cold users. However, the performance of these solutions is limited by the amount of records available from warm users to use. Thus, building a recommendation system based on few interaction records from a few users still remains a challenging problem for unpopular or early-stage recommendation platforms. This paper focuses on solving the few-shot recommendation problem for news recommendation based on two observations. First, news at different platforms (even in different languages) may share similar topics. Second, the user preference over these topics is transferable across different platforms. Therefore, we propose to solve the few-shot news recommendation problem by transferring the user-news preference from a many-shot source domain to a few-shot target domain. To bridge two domains that are even in different languages and without any overlapping users and news, we propose a novel unsupervised cross-lingual transfer model as the news encoder that aligns semantically similar news in two domains. A user encoder is constructed on top of the aligned news encoding and transfers the user preference from the source to target domain. Experimental results on two real-world news recommendation datasets show the superior performance of our proposed method on addressing few-shot news recommendation, comparing to the baselines. The source code can be found at https://github.com/taichengguo/Few-shot-NewsRec.

User Retention-oriented Recommendation with Decision Transformer

Improving user retention with reinforcement learning (RL) has attracted increasing attention due to its significant importance in boosting user engagement. However, training the RL policy from scratch without hurting users’ experience is unavoidable due to the requirement of trial-and-error searches. Furthermore, the offline methods, which aim to optimize the policy without online interactions, suffer from the notorious stability problem in value estimation or unbounded variance in counterfactual policy evaluation. To this end, we propose optimizing user retention with Decision Transformer (DT), which avoids the offline difficulty by translating the RL as an autoregressive problem. However, deploying the DT in recommendation is a non-trivial problem because of the following challenges: (1) deficiency in modeling the numerical reward value; (2) data discrepancy between the policy learning and recommendation generation; (3) unreliable offline performance evaluation. In this work, we, therefore, contribute a series of strategies for tackling the exposed issues. We first articulate an efficient reward prompt by weighted aggregation of meta embeddings for informative reward embedding. Then, we endow a weighted contrastive learning method to solve the discrepancy between training and inference. Furthermore, we design two robust offline metrics to measure user retention. Finally, the significant improvement in the benchmark datasets demonstrates the superiority of the proposed method. The implementation code is available at https://github.com/kesenzhao/DT4Rec.git.

Cooperative Retriever and Ranker in Deep Recommenders

Deep recommender systems (DRS) are intensively applied in modern web services. To deal with the massive web contents, DRS employs a two-stage workflow: retrieval and ranking, to generate its recommendation results. The retriever aims to select a small set of relevant candidates from the entire items with high efficiency; while the ranker, usually more precise but time-consuming, is supposed to further refine the best items from the retrieved candidates. Traditionally, the two components are trained either independently or within a simple cascading pipeline, which is prone to poor collaboration effect. Though some latest works suggested to train retriever and ranker jointly, there still exist many severe limitations: item distribution shift between training and inference, false negative, and misalignment of ranking order. As such, it remains to explore effective collaborations between retriever and ranker.

In this work, we present a novel framework for the joint training of retriever and ranker, named CoRR (Cooperative Retriever and Ranker). With CoRR, the retriever is improved by deriving high-quality training signals from the ranker, while the ranker is improved by learning to discriminate hard negatives sampled by the retriever. We introduce two critical techniques. Firstly, we develop an adaptive and scalable sampler based on the retriever, to generate hard negative samples for the ranker’s training. Compared with the widely-used exact top-k sampling, our method effectively alleviates the issues of false negative and item distribution shift, and thus improves the ranker’s discriminability. Secondly, we propose a novel asymptotic-unbiased estimation of KL divergence, which serves as the objective for knowledge distillation. The new objective can be efficiently optimized with commonly-used optimizers. More importantly, it leads to better alignment of ranking order between retriever and ranker, which helps to improve the retrieval quality. We conduct comprehensive experiments over four large-scale datasets, where CoRR outperforms both conventional DRS and the existing joint training methods with notable advantages. Our code will be open-sourced to facilitate future research.

Learning Vector-Quantized Item Representation for Transferable Sequential Recommenders

Recently, the generality of natural language text has been leveraged to develop transferable recommender systems. The basic idea is to employ pre-trained language models (PLM) to encode item text into item representations. Despite the promising transferability, the binding between item text and item representations might be too tight, leading to potential problems such as over-emphasizing the effect of text features and exaggerating the negative impact of domain gap. To address this issue, this paper proposes VQ-Rec, a novel approach to learning Vector-Quantized item representations for transferable sequential Recommenders. The main novelty of our approach lies in the new item representation scheme: it first maps item text into a vector of discrete indices (called item code), and then employs these indices to lookup the code embedding table for deriving item representations. Such a scheme can be denoted as “text ⟹ code ⟹ representation”. Based on this representation scheme, we further propose an enhanced contrastive pre-training approach, using semi-synthetic and mixed-domain code representations as hard negatives. Furthermore, we design a new cross-domain fine-tuning method based on a differentiable permutation-based network. Extensive experiments conducted on six public benchmarks demonstrate the effectiveness of the proposed approach, in both cross-domain and cross-platform settings. Code and pre-trained model are available at: https://github.com/RUCAIBox/VQ-Rec.

Show Me The Best Outfit for A Certain Scene: A Scene-aware Fashion Recommender System

Fashion recommendation (FR) has received increasing attention in the research of new types of recommender systems. Existing fashion recommender systems (FRSs) typically focus on clothing item suggestions for users in three scenarios: 1) how to best recommend fashion items preferred by users; 2) how to best compose a complete outfit, and 3) how to best complete a clothing ensemble. However, current FRSs often overlook an important aspect when making FR, that is, the compatibility of the clothing item or outfit recommendations is highly dependent on the scene context. To this end, we propose the scene-aware fashion recommender system (SAFRS), which uncovers a hitherto unexplored avenue where scene information is taken into account when constructing the FR model. More specifically, our SAFRS addresses this problem by encoding scene and outfit information in separation attention encoders and then fusing the resulting feature embeddings via a novel scene-aware compatibility score function. Extensive qualitative and quantitative experiments are conducted to show that our SAFRS model outperforms all baselines for every evaluated metric.

Multi-Behavior Recommendation with Cascading Graph Convolution Networks

Multi-behavior recommendation, which exploits auxiliary behaviors (e.g., click and cart) to help predict users’ potential interactions on the target behavior (e.g., buy), is regarded as an effective way to alleviate the data sparsity or cold-start issues in recommendation. Multi-behaviors are often taken in certain orders in real-world applications (e.g., click>cart>buy). In a behavior chain, a latter behavior usually exhibits a stronger signal of user preference than the former one does. Most existing multi-behavior models fail to capture such dependencies in a behavior chain for embedding learning. In this work, we propose a novel multi-behavior recommendation model with cascading graph convolution networks (named MB-CGCN). In MB-CGCN, the embeddings learned from one behavior are used as the input features for the next behavior’s embedding learning after a feature transformation operation. In this way, our model explicitly utilizes the behavior dependencies in embedding learning. Experiments on two benchmark datasets demonstrate the effectiveness of our model on exploiting multi-behavior data. It outperforms the best baseline by 33.7% and 35.9% on average over the two datasets in terms of Recall@10 and NDCG@10, respectively.

AutoMLP: Automated MLP for Sequential Recommendations

Sequential recommender systems aim to predict users’ next interested item given their historical interactions. However, a long-standing issue is how to distinguish between users’ long/short-term interests, which may be heterogeneous and contribute differently to the next recommendation. Existing approaches usually set pre-defined short-term interest length by exhaustive search or empirical experience, which is either highly inefficient or yields subpar results. The recent advanced transformer-based models can achieve state-of-the-art performances despite the aforementioned issue, but they have a quadratic computational complexity to the length of the input sequence. To this end, this paper proposes a novel sequential recommender system, AutoMLP, aiming for better modeling users’ long/short-term interests from their historical interactions. In addition, we design an automated and adaptive search algorithm for preferable short-term interest length via end-to-end optimization. Through extensive experiments, we show that AutoMLP has competitive performance against state-of-the-art methods, while maintaining linear computational complexity.

NASRec: Weight Sharing Neural Architecture Search for Recommender Systems

The rise of deep neural networks offers new opportunities in optimizing recommender systems. However, optimizing recommender systems using deep neural networks requires delicate architecture fabrication. We propose NASRec, a paradigm that trains a single supernet and efficiently produces abundant models/sub-architectures by weight sharing. To overcome the data multi-modality and architecture heterogeneity challenges in the recommendation domain, NASRec establishes a large supernet (i.e., search space) to search the full architectures. The supernet incorporates versatile choice of operators and dense connectivity to minimize human efforts for finding priors. The scale and heterogeneity in NASRec impose several challenges, such as training inefficiency, operator-imbalance, and degraded rank correlation. We tackle these challenges by proposing single-operator any-connection sampling, operator-balancing interaction modules, and post-training fine-tuning. Our crafted models, NASRecNet, show promising results on three Click-Through Rates (CTR) prediction benchmarks, indicating that NASRec outperforms both manually designed models and existing NAS methods with state-of-the-art performance. Our work is publicly available here.

Membership Inference Attacks Against Sequential Recommender Systems

Recent studies have demonstrated the vulnerability of recommender systems to membership inference attacks, which determine whether a user’s historical data was utilized for model training, posing serious privacy leakage issues. Existing works assumed that member and non-member users follow different recommendation modes, and then infer membership based on the difference vector between the user’s historical behaviors and the recommendation list. The previous frameworks are invalid against inductive recommendations, such as sequential recommendations, since the disparities of difference vectors constructed by the recommendations between members and non-members become imperceptible. This motivates us to dig deeper into the target model. In addition, most MIA frameworks assume that they can obtain some in-distribution data from the same distribution of the target data, which is hard to gain in recommender system.

To address these difficulties, we propose a Membership Inference Attack framework against sequential recommenders based on Model Extraction(ME-MIA). Specifically, we train a surrogate model to simulate the target model based on two universal loss functions. For a given behavior sequence, the loss functions ensure the recommended items and corresponding rank of the surrogate model are consistent with the target model’s recommendation. Due to the special training mode of the surrogate model, it is hard to judge which user is its member(non-member). Therefore, we establish a shadow model and use shadow model’s members(non-members) to train the attack model later. Next, we build a user feature generator to construct representative feature vectors from the shadow(surrogate) model. The crafting feature vectors are finally input into the attack model to identify users’ membership. Furthermore, to tackle the high cost of obtaining in-distribution data, we develop two variants of ME-MIA, realizing data-efficient and even data-free MIA by fabricating authentic in-distribution data. Notably, the latter is impossible in the previous works. Finally, we evaluate ME-MIA against multiple sequential recommendation models on three real-world datasets. Experimental results show that ME-MIA and its variants can achieve efficient extraction and outperform state-of-the-art algorithms in terms of attack performance.

Offline Policy Evaluation in Large Action Spaces via Outcome-Oriented Action Grouping

Offline policy evaluation (OPE) aims to accurately estimate the performance of a hypothetical policy using only historical data, which has drawn increasing attention in a wide range of applications including recommender systems and personalized medicine. With the presence of rising granularity of consumer data, many industries started exploring larger action candidate spaces to support more precise personalized action. While inverse propensity score (IPS) is a standard OPE estimator, it suffers from more severe variance issues with increasing action spaces. To address this issue, we theoretically prove that the estimation variance can be reduced by merging actions into groups while the distinction among these action effects on the outcome can induce extra bias. Motivated by these, we propose a novel IPS estimator with outcome-oriented action Grouping (GroupIPS), which leverages a Lipschitz regularized network to measure the distance of action effects in the embedding space and merges nearest action neighbors. This strategy enables more robust estimation by achieving smaller variances while inducing minor additional bias. Empirically, extensive experiments on both synthetic and real world datasets demonstrate the effectiveness of our proposed method.

Communicative MARL-based Relevance Discerning Network for Repetition-Aware Recommendation

The repeated user-item interaction now is becoming a common phenomenon in the e-commerce scenario. Due to its potential economic profit, various models are emerging to predict which item will be re-interacted based on the user-item interactions. In this specific scenario, item relevance is a critical factor that needs to be concerned, which tends to have different effects on the succeeding re-interacted one (i.e., stimulating or delaying its emergence). It is necessary to make a detailed discernment of item relevance for a better repetition-aware recommendation. Unfortunately, existing works usually mixed all these types, which may disturb the learning process and result in poor performance.

In this paper, we introduce a novel Communicative MARL-based Relevance Discerning Network (CARDfor short) to automatically discern the item relevance for a better repetition-aware recommendation. Specifically, CARDformalizes the item relevance discerning problem into a communication selection process in MARL. CARDtreats each unique interacted item as an agent and defines three different communication types over agents, which are stimulative, inhibitive, and noisy respectively. After this, CARDutilizes a Gumbel-enhanced classifier to distinguish the communication types among agents, and an attention-based Reactive Point Process is further designed to transmit the well-discerned stimulative and inhibitive incentives separately among all agents to make an effective collaboration for repetition decisions. Experimental results on two real-world e-commerce datasets show that our proposed method outperforms the state-of-the-art recommendation methods in terms of both sequential and repetition-aware recommenders. Furthermore, CARDis also deployed in the online sponsored search advertising system in Meituan, obtaining a performance improvement of over 1.5% and 1.2% in CTR and effective Cost Per Mille (eCPM) respectively, which is significant to the business.

Invariant Collaborative Filtering to Popularity Distribution Shift

Collaborative Filtering (CF) models, despite their great success, suffer from severe performance drops due to popularity distribution shifts, where these changes are ubiquitous and inevitable in real-world scenarios. Unfortunately, most leading popularity debiasing strategies, rather than tackling the vulnerability of CF models to varying popularity distributions, require prior knowledge of the test distribution to identify the degree of bias and further learn the popularity-entangled representations to mitigate the bias. Consequently, these models result in significant performance benefits in the target test set, while dramatically deviating the recommendation from users’ true interests without knowing the popularity distribution in advance. In this work, we propose a novel learning framework, Invariant Collaborative Filtering (InvCF), to discover disentangled representations that faithfully reveal the latent preference and popularity semantics without making any assumption about the popularity distribution. At its core is the distillation of unbiased preference representations (i.e., user preference on item property), which are invariant to the change of popularity semantics, while filtering out the popularity feature that is unstable or outdated. Extensive experiments on five benchmark datasets and four evaluation settings (i.e., synthetic long-tail, unbiased, temporal split, and out-of-distribution evaluations) demonstrate that InvCF outperforms the state-of-the-art baselines in terms of popularity generalization ability on real recommendations. Visualization studies shed light on the advantages of InvCF for disentangled representation learning. Our codes are available at https://github.com/anzhang314/InvCF.

Modeling Temporal Positive and Negative Excitation for Sequential Recommendation

Sequential recommendation aims to predict the next item which interests users via modeling their interest in items over time. Most of the existing works on sequential recommendation model users’ dynamic interest in specific items while overlooking users’ static interest revealed by some static attribute information of items, e.g., category, brand. Moreover, existing works often only consider the positive excitation of a user’s historical interactions on his/her next choice on candidate items while ignoring the commonly existing negative excitation, resulting in insufficiently modeling dynamic interest. The overlook of static interest and negative excitation will lead to incomplete interest modeling and thus impedes the recommendation performance. To this end, in this paper, we propose modeling both static interest and negative excitation for dynamic interest to further improve the recommendation performance. Accordingly, we design a novel Static-Dynamic Interest Learning (SDIL) framework featured with a novel Temporal Positive and Negative Excitation Modeling (TPNE) module for accurate sequential recommendation. TPNE is specially designed for comprehensively modeling dynamic interest based on temporal positive and negative excitation learning. Extensive experiments on three real-world datasets show that SDIL can effectively capture both static and dynamic interest and outperforms state-of-the-art baselines.

Personalized Graph Signal Processing for Collaborative Filtering

The collaborative filtering (CF) problem with only user-item interaction information can be solved by graph signal processing (GSP), which uses low-pass filters to smooth the observed interaction signals on the similarity graph to obtain the prediction signals. However, the interaction signal may not be sufficient to accurately characterize user interests and the low-pass filters may ignore the useful information contained in the high-frequency component of the observed signals, resulting in suboptimal accuracy. To this end, we propose a personalized graph signal processing (PGSP) method for collaborative filtering. Firstly, we design the personalized graph signal containing richer user information and construct an augmented similarity graph containing more graph topology information, to more effectively characterize user interests. Secondly, we devise a mixed-frequency graph filter to introduce useful information in the high-frequency components of the observed signals by combining an ideal low-pass filter that smooths signals globally and a linear low-pass filter that smooths signals locally. Finally, we combine the personalized graph signal, the augmented similarity graph and the mixed-frequency graph filter by proposing a pipeline consisting of three key steps: pre-processing, graph convolution and post-processing. Extensive experiments show that PGSP can achieve superior accuracy compared with state-of-the-art CF methods and, as a nonparametric method, PGSP has very high training efficiency.

Multi-Task Recommendations with Reinforcement Learning

In recent years, Multi-task Learning (MTL) has yielded immense success in Recommender System (RS) applications [40]. However, current MTL-based recommendation models tend to disregard the session-wise patterns of user-item interactions because they are predominantly constructed based on item-wise datasets. Moreover, balancing multiple objectives has always been a challenge in this field, which is typically avoided via linear estimations in existing works. To address these issues, in this paper, we propose a Reinforcement Learning (RL) enhanced MTL framework, namely RMTL, to combine the losses of different recommendation tasks using dynamic weights. To be specific, the RMTL structure can address the two aforementioned issues by (i) constructing an MTL environment from session-wise interactions and (ii) training multi-task actor-critic network structure, which is compatible with most existing MTL-based recommendation models, and (iii) optimizing and fine-tuning the MTL loss function using the weights generated by critic networks. Experiments on two real-world public datasets demonstrate the effectiveness of RMTL with a higher AUC against state-of-the-art MTL-based recommendation models. Additionally, we evaluate and validate RMTL’s compatibility and transferability across various MTL models.

A Self-Correcting Sequential Recommender

Sequential recommendations aim to capture users’ preferences from their historical interactions so as to predict the next item that they will interact with. Sequential recommendation methods usually assume that all items in a user’s historical interactions reflect her/his preferences and transition patterns between items. However, real-world interaction data is imperfect in that (i) users might erroneously click on items, i.e., so-called misclicks on irrelevant items, and (ii) users might miss items, i.e., unexposed relevant items due to inaccurate recommendations.

To tackle the two issues listed above, we propose STEAM, a Self-correcTing sEquentiAl recoMmender. STEAM first corrects an input item sequence by adjusting the misclicked and/or missed items. It then uses the corrected item sequence to train a recommender and make the next item prediction. We design an item-wise corrector that can adaptively select one type of operation for each item in the sequence. The operation types are ‘keep’, ‘delete’ and ‘insert.’ In order to train the item-wise corrector without requiring additional labeling, we design two self-supervised learning mechanisms: (i) deletion correction (i.e., deleting randomly inserted items), and (ii) insertion correction (i.e., predicting randomly deleted items). We integrate the corrector with the recommender by sharing the encoder and by training them jointly. We conduct extensive experiments on three real-world datasets and the experimental results demonstrate that STEAM outperforms state-of-the-art sequential recommendation baselines. Our in-depth analyses confirm that STEAM benefits from learning to correct the raw item sequences.

Cross-domain Recommendation with Behavioral Importance Perception

Cross-domain recommendation (CDR) aims to leverage the source domain information to provide better recommendation for the target domain, which is widely adopted in recommender systems to alleviate the data sparsity and cold-start problems. However, existing CDR methods mostly focus on designing effective model architectures to transfer the source domain knowledge, ignoring the behavior-level effect during the loss optimization process, where behaviors regarding different aspects in the source domain may have different importance for the CDR model optimization. The ignorance of the behavior-level effect will cause the carefully designed model architectures ending up with sub-optimal parameters, which limits the recommendation performance. To tackle the problem, we propose a generic behavioral importance-aware optimization framework for cross-domain recommendation (BIAO). Specifically, we propose a behavioral perceptron which predicts the importance of each source behavior according to the corresponding item’s global impact and local user-specific impact. The joint optimization process of the CDR model and the behavioral perceptron is formulated as a bi-level optimization problem. In the lower optimization, only the CDR model is updated with weighted source behavior loss and the target domain loss, while in the upper optimization, the behavioral perceptron is updated with implicit gradient from a developing dataset obtained through the proposed reorder-and-reuse strategy. Extensive experiments show that our proposed optimization framework consistently improves the performance of different cross-domain recommendation models in 7 cross-domain scenarios, demonstrating that our method can serve as a generic and powerful tool for cross-domain recommendation1.

Balancing Unobserved Confounding with a Few Unbiased Ratings in Debiased Recommendations

Recommender systems are seen as an effective tool to address information overload, but it is widely known that the presence of various biases makes direct training on large-scale observational data result in sub-optimal prediction performance. In contrast, unbiased ratings obtained from randomized controlled trials or A/B tests are considered to be the golden standard, but are costly and small in scale in reality. To exploit both types of data, recent works proposed to use unbiased ratings to correct the parameters of the propensity or imputation models trained on the biased dataset. However, the existing methods fail to obtain accurate predictions in the presence of unobserved confounding or model misspecification. In this paper, we propose a theoretically guaranteed model-agnostic balancing approach that can be applied to any existing debiasing method with the aim of combating unobserved confounding and model misspecification. The proposed approach makes full use of unbiased data by alternatively correcting model parameters learned with biased data, and adaptively learning balance coefficients of biased samples for further debiasing. Extensive real-world experiments are conducted along with the deployment of our proposal on four representative debiasing methods to demonstrate the effectiveness.

FedACK: Federated Adversarial Contrastive Knowledge Distillation for Cross-Lingual and Cross-Model Social Bot Detection

Social bot detection is of paramount importance to the resilience and security of online social platforms. The state-of-the-art detection models are siloed and have largely overlooked a variety of data characteristics from multiple cross-lingual platforms. Meanwhile, the heterogeneity of data distribution and model architecture make it intricate to devise an efficient cross-platform and cross-model detection framework. In this paper, we propose FedACK, a new federated adversarial contrastive knowledge distillation framework for social bot detection. We devise a GAN-based federated knowledge distillation mechanism for efficiently transferring knowledge of data distribution among clients. In particular, a global generator is used to extract the knowledge of global data distribution and distill it into each client’s local model. We leverage local discriminator to enable customized model design and use local generator for data enhancement with hard-to-decide samples. Local training is conducted as multi-stage adversarial and contrastive learning to enable consistent feature spaces among clients and to constrain the optimization direction of local models, reducing the divergences between local and global models. Experiments demonstrate that FedACK outperforms the state-of-the-art approaches in terms of accuracy, communication efficiency, and feature space consistency.

Code Recommendation for Open Source Software Developers

Open Source Software (OSS) is forming the spines of technology infrastructures, attracting millions of talents to contribute. Notably, it is challenging and critical to consider both the developers’ interests and the semantic features of the project code to recommend appropriate development tasks to OSS developers. In this paper, we formulate the novel problem of code recommendation, whose purpose is to predict the future contribution behaviors of developers given their interaction history, the semantic features of source code, and the hierarchical file structures of projects. We introduce CODER, a novel graph-based CODE Recommendation framework for open source software developers, which accounts for the complex interactions among multiple parties within the system. CODER jointly models microscopic user-code interactions and macroscopic user-project interactions via a heterogeneous graph and further bridges the two levels of information through aggregation on file-structure graphs that reflect the project hierarchy. Moreover, to overcome the lack of reliable benchmarks, we construct three large-scale datasets to facilitate future research in this direction. Extensive experiments show that our CODER framework achieves superior performance under various experimental settings, including intra-project, cross-project, and cold-start recommendation.

Web Table Formatting Affects Readability on Mobile Devices

Reading large tables on small mobile screens presents serious usability challenges that can be addressed, in part, by better table formatting. However, there are few evidenced-based guidelines for formatting mobile tables to improve readability. For this work, we first conducted a survey to investigate how people interact with tables on mobile devices and conducted a study with designers to identify which design considerations are most critical. Based on these findings, we designed and conducted three large scale studies with remote crowdworker participants. Across the studies, we analyze over 14,000 trials from 590 participants who each viewed and answered questions about 28 diverse tables rendered in different formats. We find that smaller cell padding and frozen headers lead to faster task completion, and that while zebra striping and row borders do not speed up tasks, they are still subjectively preferred by participants.

Web Structure Derived Clustering for Optimised Web Accessibility Evaluation

Web accessibility evaluation is a costly and complex process due to limited time, resources and ambiguity. To optimise the accessibility evaluation process, we aim to reduce the number of pages auditors must review by employing statistically representative pages, reducing a site of thousands of pages to a manageable review of archetypal pages. Our paper focuses on representativeness, one of six proposed metrics that form our methodology, to address the limitations we have identified with the W3C Website Accessibility Conformance Evaluation Methodology (WCAG-EM). These include the evaluative scope, the non-probabilistic sampling approach, and the potential for bias within the selected sample. Representativeness, in particular, is a metric to assess the quality and coverage of sampling. To measure this, we systematically evaluate five web page representations with a website of 388 pages, including tags, structure, the DOM tree, content, and a mixture of structure and content. Our findings highlight the importance of including structural components in representations. We validate our conclusions using the same methodology for three additional random sites of 500 pages. As an exclusive attribute, we find that features derived from web content are suboptimal and can lead to lower quality and more disparate clustering for optimised accessibility evaluation.

Denoising and Prompt-Tuning for Multi-Behavior Recommendation

In practical recommendation scenarios, users often interact with items under multi-typed behaviors (e.g., click, add-to-cart, and purchase). Traditional collaborative filtering techniques typically assume that users only have a single type of behavior with items, making it insufficient to utilize complex collaborative signals to learn informative representations and infer actual user preferences. Consequently, some pioneer studies explore modeling multi-behavior heterogeneity to learn better representations and boost the performance of recommendations for a target behavior. However, a large number of auxiliary behaviors (i.e., click and add-to-cart) could introduce irrelevant information to recommenders, which could mislead the target behavior (i.e., purchase) recommendation, rendering two critical challenges: (i) denoising auxiliary behaviors and (ii) bridging the semantic gap between auxiliary and target behaviors. Motivated by the above observation, we propose a novel framework–Denoising and Prompt-Tuning (DPT) with a three-stage learning paradigm to solve the aforementioned challenges. In particular, DPT is equipped with a pattern-enhanced graph encoder in the first stage to learn complex patterns as prior knowledge in a data-driven manner to guide learning informative representation and pinpointing reliable noise for subsequent stages. Accordingly, we adopt different lightweight tuning approaches with effectiveness and efficiency in the following stages to further attenuate the influence of noise and alleviate the semantic gap among multi-typed behaviors. Extensive experiments on two real-world datasets demonstrate the superiority of DPT over a wide range of state-of-the-art methods. The implementation code is available online at https://github.com/zc-97/DPT.

pFedPrompt: Learning Personalized Prompt for Vision-Language Models in Federated Learning

Pre-trained vision-language models like CLIP show great potential in learning representations that capture latent characteristics of users. A recently proposed method called Contextual Optimization (CoOp) introduces the concept of training prompt for adapting pre-trained vision-language models. Given the lightweight nature of this method, researchers have migrated the paradigm from centralized to decentralized system to innovate the collaborative training framework of Federated Learning (FL). However, current prompt training in FL mainly focuses on modeling user consensus and lacks the adaptation to user characteristics, leaving the personalization of prompt largely under-explored. Researches over the past few years have applied personalized FL (pFL) approaches to customizing models for heterogeneous users. Unfortunately, we find that with the variation of modality and training behavior, directly applying the pFL methods to prompt training leads to insufficient personalization and performance. To bridge the gap, we present pFedPrompt, which leverages the unique advantage of multimodality in vision-language models by learning user consensus from linguistic space and adapting to user characteristics in visual space in a non-parametric manner. Through this dual collaboration, the learned prompt will be fully personalized and aligned to the user’s local characteristics. We conduct extensive experiments across various datasets under the FL setting with statistical heterogeneity. The results demonstrate the superiority of our pFedPrompt against the alternative approaches with robust performance.

Mutual Wasserstein Discrepancy Minimization for Sequential Recommendation

Self-supervised sequential recommendation significantly improves recommendation performance by maximizing mutual information with well-designed data augmentations. However, the mutual information estimation is based on the calculation of Kullback–Leibler divergence with several limitations, including asymmetrical estimation, the exponential need of the sample size, and training instability. Also, existing data augmentations are mostly stochastic and can potentially break sequential correlations with random modifications. These two issues motivate us to investigate an alternative robust mutual information measurement capable of modeling uncertainty and alleviating KL divergence’s limitations.

To this end, we propose a novel self-supervised learning framework based on the Mutual WasserStein discrepancy minimization (MStein) for the sequential recommendation. We propose the Wasserstein Discrepancy Measurement to measure the mutual information between augmented sequences. Wasserstein Discrepancy Measurement builds upon the 2-Wasserstein distance, which is more robust, more efficient in small batch sizes, and able to model the uncertainty of stochastic augmentation processes. We also propose a novel contrastive learning loss based on Wasserstein Discrepancy Measurement. Extensive experiments on four benchmark datasets demonstrate the effectiveness of MStein over baselines. More quantitative analyses show the robustness against perturbations and training efficiency in batch size. Finally, improvements analysis indicates better representations of popular users/items with significant uncertainty. The source code is in https://github.com/zfan20/MStein.

Confident Action Decision via Hierarchical Policy Learning for Conversational Recommendation

Conversational recommender systems (CRS) aim to acquire a user’s dynamic interests for a successful recommendation. By asking about his/her preferences, CRS explore current needs of a user and recommend items of interest. However, previous works may not determine a proper action in a timely manner which leads to the insufficient information gathering and the waste of conversation turns. Since they learn a single decision policy, it is difficult for them to address the general decision problems in CRS. Besides, existing methods do not distinguish whether the past behaviors inferred from the historical interactions are closely related to the user’s current preference. To address these issues, we propose a novel Hierarchical policy learning based Conversational Recommendation framework (HiCR). HiCR formulates the multi-round decision making process as a hierarchical policy learning scheme, which consists of both a high-level policy and a low-level policy. In detail, the high-level policy aims to determine what type of action to take, such as a recommendation or a query, by observing the comprehensive conversation information. According to the decided action type, the low-level policy selects a specific action, such as which attribute to ask or which item to recommend. The hierarchical conversation policy enables CRS to decide an optimal action, resulting in reducing the unnecessary consumption of conversation turns and the continuous failure of recommendations. Furthermore, in order to filter out the unnecessary historical information when enriching the current user preference, we extract and utilize the informative past behaviors that are attentive to the current needs. Empirical experiments on four real-world datasets show the superiority of our approach against the current state-of-the-art methods.

CAMUS: Attribute-Aware Counterfactual Augmentation for Minority Users in Recommendation

Embedding-based methods currently achieved impressive success in recommender systems. However, such methods are more likely to suffer from bias in data distribution, especially the attribute bias problem. For example, when a certain type of user, like the elderly, occupies the mainstream, the recommendation results of minority users would be seriously affected by the mainstream users’ attributes. To address this problem, most existing methods are proposed from the perspective of fairness, which focuses on eliminating unfairness but deteriorates the recommendation performance. Unlike these methods, in this paper, we focus on improving the recommendation performance for minority users of biased attributes. Along this line, we propose a novel attribute-aware Counterfactual Augmentation framework for Minority Users(CAMUS). Specifically, the CAMUS consists of a counterfactual augmenter, a confidence estimator, and a recommender. The counterfactual augmenter conducts data augmentation for the minority group by utilizing the interactions of mainstream users based on a universal counterfactual assumption. Besides, a tri-training-based confidence estimator is applied to ensure the effectiveness of augmentation. Extensive experiments on three real-world datasets have demonstrated the superior performance of the proposed methods. Further case studies verify the universality of the proposed CAMUS framework on different data sparsity, attributes, and models.

SESSION: Web Mining and Content Analysis

Word Sense Disambiguation by Refining Target Word Embedding

Word Sense Disambiguation (WSD) which aims to identify the correct sense of a target word appearing in a specific context is essential for web text analysis. The use of glosses has been explored as a means for WSD. However, only a few works model the correlation between the target context and gloss. We add to the body of literature by presenting a model that employs a multi-head attention mechanism on deep contextual features of the target word and candidate glosses to refine the target word embedding. Furthermore, to encourage the model to learn the relevant part of target features that align with the correct gloss, we recursively alternate attention on target word features and that of candidate glosses to gradually extract the relevant contextual features of the target word, refining its representation and strengthening the final disambiguation results. Empirical studies on the five most commonly used benchmark datasets show that our proposed model is effective and achieves state-of-the-art results.

Hashtag-Guided Low-Resource Tweet Classification

Social media classification tasks (e.g., tweet sentiment analysis, tweet stance detection) are challenging because social media posts are typically short, informal, and ambiguous. Thus, training on tweets is challenging and demands large-scale human-annotated labels, which are time-consuming and costly to obtain. In this paper, we find that providing hashtags to social media tweets can help alleviate this issue because hashtags can enrich short and ambiguous tweets in terms of various information, such as topic, sentiment, and stance. This motivates us to propose a novel Hashtag-guided Tweet Classification model (HashTation), which automatically generates meaningful hashtags for the input tweet to provide useful auxiliary signals for tweet classification. To generate high-quality and insightful hashtags, our hashtag generation model retrieves and encodes the post-level and entity-level information across the whole corpus. Experiments show that HashTation achieves significant improvements on seven low-resource tweet classification tasks, in which only a limited amount of training data is provided, showing that automatically enriching tweets with model-generated hashtags could significantly reduce the demand for large-scale human-labeled data. Further analysis demonstrates that HashTation is able to generate high-quality hashtags that are consistent with the tweets and their labels. The code is available at https://github.com/shizhediao/HashTation.

HISum: Hyperbolic Interaction Model for Extractive Multi-Document Summarization

Extractive summarization helps provide a short description or a digest of news or other web texts. It enhances the reading experience of users, especially when they are reading on small displays (e.g., mobile phones). Matching-based methods are recently proposed for the extractive summarization task, which extracts a summary from a global view via a document-summary matching framework. However, these methods only calculate similarities between candidate summaries and the entire document embeddings, insufficiently capturing interactions between different contextual information in the document to accurately estimate the importance of candidates. In this paper, we propose a new hyperbolic interaction model for extractive multi-document summarization (HISum). Specifically, HISum first learns document and candidate summary representations in the same hyperbolic space to capture latent hierarchical structures and then estimates the importance scores of candidates by jointly modeling interactions between each candidate and the document from global and local views. Finally, the importance scores are used to rank and extract the best candidate as the extracted summary. Experimental results on several benchmarks show that HISum outperforms the state-of-the-art extractive baselines1.

FormerTime: Hierarchical Multi-Scale Representations for Multivariate Time Series Classification

Deep learning-based algorithms, e.g., convolutional networks, have significantly facilitated multivariate time series classification (MTSC) task. Nevertheless, they suffer from the limitation in modeling long-range dependence due to the nature of convolution operations. Recent advancements have shown the potential of transformers to capture long-range dependence. However, it would incur severe issues, such as fixed scale representations, temporal-invariant and quadratic time complexity, with transformers directly applicable to the MTSC task because of the distinct properties of time series data. To tackle these issues, we propose FormerTime, an hierarchical representation model for improving the classification capacity for the MTSC task. In the proposed FormerTime, we employ a hierarchical network architecture to perform multi-scale feature maps. Besides, a novel transformer encoder is further designed, in which an efficient temporal reduction attention layer and a well-informed contextual positional encoding generating strategy are developed. To sum up, FormerTime exhibits three aspects of merits: (1) learning hierarchical multi-scale representations from time series data, (2) inheriting the strength of both transformers and convolutional networks, and (3) tacking the efficiency challenges incurred by the self-attention mechanism. Extensive experiments performed on 10 publicly available datasets from UEA archive verify the superiorities of the FormerTime compared to previous competitive baselines.

Descartes: Generating Short Descriptions of Wikipedia Articles

Wikipedia is one of the richest knowledge sources on the Web today. In order to facilitate navigating, searching, and maintaining its content, Wikipedia’s guidelines state that all articles should be annotated with a so-called short description indicating the article’s topic (e.g., the short description of beer is “Alcoholic drink made from fermented cereal grains”). Nonetheless, a large fraction of articles (ranging from 10.2% in Dutch to 99.7% in Kazakh) have no short description yet, with detrimental effects for millions of Wikipedia users. Motivated by this problem, we introduce the novel task of automatically generating short descriptions for Wikipedia articles and propose Descartes, a multilingual model for tackling it. Descartes integrates three sources of information to generate an article description in a target language: the text of the article in all its language versions, the already-existing descriptions (if any) of the article in other languages, and semantic type information obtained from a knowledge graph. We evaluate a Descartes model trained for handling 25 languages simultaneously, showing that it beats baselines (including a strong translation-based baseline) and performs on par with monolingual models tailored for specific languages. A human evaluation on three languages further shows that the quality of Descartes’s descriptions is largely indistinguishable from that of human-written descriptions; e.g., 91.3% of our English descriptions (vs. 92.1% of human-written descriptions) pass the bar for inclusion in Wikipedia, suggesting that Descartes is ready for production, with the potential to support human editors in filling a major gap in today’s Wikipedia across languages.

Dynamically Expandable Graph Convolution for Streaming Recommendation

Personalized recommender systems have been widely studied and deployed to reduce information overload and satisfy users’ diverse needs. However, conventional recommendation models solely conduct a one-time training-test fashion and can hardly adapt to evolving demands, considering user preference shifts and ever-increasing users and items in the real world. To tackle such challenges, the streaming recommendation is proposed and has attracted great attention recently. Among these, continual graph learning is widely regarded as a promising approach for the streaming recommendation by academia and industry. However, existing methods either rely on the historical data replay which is often not practical under increasingly strict data regulations, or can seldom solve the over-stability issue. To overcome these difficulties, we propose a novel Dynamically Expandable Graph Convolution (DEGC) algorithm from a model isolation perspective for the streaming recommendation which is orthogonal to previous methods. Based on the motivation of disentangling outdated short-term preferences from useful long-term preferences, we design a sequence of operations including graph convolution pruning, refining, and expanding to only preserve beneficial long-term preference-related parameters and extract fresh short-term preferences. Moreover, we model the temporal user preference, which is utilized as user embedding initialization, for better capturing the individual-level preference shifts. Extensive experiments on the three most representative GCN-based recommendation models and four industrial datasets demonstrate the effectiveness and robustness of our method.

A Dual Prompt Learning Framework for Few-Shot Dialogue State Tracking

Dialogue State Tracking (DST) module is an essential component of task-oriented dialog systems to understand users’ goals and needs. Collecting dialogue state labels including slots and values can be costly, requiring experts to annotate all (slot, value) information for each turn in dialogues. It is also difficult to define all possible slots and values in advance, especially with the wide application of dialogue systems in more and more new-rising applications. In this paper, we focus on improving DST module to generate dialogue states in circumstances with limited annotations and knowledge about slot ontology. To this end, we design a dual prompt learning framework for few-shot DST. The dual framework aims to explore how to utilize the language understanding and generation capabilities of pre-trained language models for DST efficiently. Specifically, we consider the learning of slot generation and value generation as dual tasks, and two kinds of prompts are designed based on this dual structure to incorporate task-related knowledge of these two tasks respectively. In this way, the DST task can be formulated as a language modeling task efficiently under few-shot settings. To evaluate the proposed framework, we conduct experiments on two task-oriented dialogue datasets. The results demonstrate that the proposed method not only outperforms existing state-of-the-art few-shot methods, but also can generate unseen slots. It indicates that DST-related knowledge can be probed from pre-trained language models and utilized to address low-resource DST efficiently with the help of prompt learning.

Dual Policy Learning for Aggregation Optimization in Graph Neural Network-based Recommender Systems

Graph Neural Networks (GNNs) provide effective representations for recommendation tasks. GNN-based recommendation systems (GNN-Rs) capture the complex high-order connectivity between users and items by aggregating information from distant neighbors and can improve the performance of recommender systems. Recently, Knowledge Graphs (KGs) have also been incorporated into the user-item interaction graph to provide more abundant contextual information; they are exploited to address cold-start problems and enable more explainable aggregation in GNN-Rs. However, due to the heterogeneous nature of users and items, developing an effective aggregation strategy that works across multiple GNN-Rs, such as LightGCN and KGAT, remains a challenge. In this paper, we propose a novel reinforcement learning-based message passing framework for recommender systems, which we call DPAO (Dual Policy learning framework for Aggregation Optimization). This framework adaptively determines high-order connectivity to aggregate users and items using dual policy learning. Dual policy learning leverages two Deep-Q-Network models to exploit the user- and item-aware feedback from a GNN-R and boost the performance of the target GNN-R. Our proposed framework was evaluated with both non-KG-based and KG-based GNN-R models on six real-world datasets, and their results show that our proposed framework significantly enhances the recent base model, improving nDCG and Recall by up to 63.7% and 42.9%, respectively. Our implementation code is available at https://github.com/steve30572/DPAO/.

CL-WSTC: Continual Learning for Weakly Supervised Text Classification on the Internet

Continual text classification is an important research direction in Web mining. Existing works are limited to supervised approaches relying on abundant labeled data, but in the open and dynamic environment of Internet, involving constant semantic change of known topics and the appearance of unknown topics, text annotations are hard to access in time for each period. That calls for the technique of weakly supervised text classification (WSTC), which requires just seed words for each category and has succeed in static text classification tasks. However, there are still no studies of applying WSTC methods in a continual learning paradigm to actually accommodate the open and evolving Internet. In this paper, we tackle this problem for the first time and propose a framework, named Continual Learning for Weakly Supervised Text Classification (CL-WSTC), which can take any WSTC method as base model. It consists of two modules, classification decision with delay and seed word updating. In the former, the probability threshold for each category in each period is adaptively learned to determine the acceptance/rejection of texts. In the latter, with candidate words output by the base model, seed words are added and deleted via reinforcement learning with immediate rewards, according to an empirically certified unsupervised measure. Extensive experiments show that our approach has strong universality and can achieve a better trade-off between classification accuracy and decision timeliness compared to non-continual counterparts, with intuitively interpretable updating of seed words.

TTS: A Target-based Teacher-Student Framework for Zero-Shot Stance Detection

The goal of zero-shot stance detection (ZSSD) is to identify the stance (in favor of, against, or neutral) of a text towards an unseen target in the inference stage. In this paper, we explore this problem from a novel angle by proposing a Target-based Teacher-Student learning (TTS) framework. Specifically, we first augment the training set by extracting diversified targets that are unseen during training with a keyphrase generation model. Then, we develop a teacher-student framework which effectively utilizes the augmented data. Extensive experiments show that our model significantly outperforms state-of-the-art ZSSD baselines on the available benchmark dataset for this task by 8.9% in macro-averaged F1. In addition, previous ZSSD requires human-annotated targets and labels during training, which may not be available in real-world applications. Therefore, we go one step further by proposing a more challenging open-world ZSSD task: identifying the stance of a text towards an unseen target without human-annotated targets and stance labels. We show that our TTS can be easily adapted to the new task. Remarkably, TTS without human-annotated targets and stance labels even significantly outperforms previous state-of-the-art ZSSD baselines trained with human-annotated data. We publicly release our code 1 to facilitate future research.

Learning Robust Multi-Modal Representation for Multi-Label Emotion Recognition via Adversarial Masking and Perturbation

Recognizing emotions from multi-modal data is an emotion recognition task that requires strong multi-modal representation ability. The general approach to this task is to naturally train the representation model on training data without intervention. However, such natural training scheme is prone to modality bias of representation (i.e., tending to over-encode some informative modalities while neglecting other modalities) and data bias of training (i.e., tending to overfit training data). These biases may lead to instability (e.g., performing poorly when the neglected modality is dominant for recognition) and weak generalization (e.g., performing poorly when unseen data is inconsistent with overfitted data) of the model on unseen data. To address these problems, this paper presents two adversarial training strategies to learn more robust multi-modal representation for multi-label emotion recognition. Firstly, we propose an adversarial temporal masking strategy, which can enhance the encoding of other modalities by masking the most emotion-related temporal units (e.g., words for text or frames for video) of the informative modality. Secondly, we propose an adversarial parameter perturbation strategy, which can enhance the generalization of the model by adding the adversarial perturbation to the parameters of model. Both strategies boost model performance on the benchmark MMER datasets CMU-MOSEI and NEMu. Experimental results demonstrate the effectiveness of the proposed method compared with the previous state-of-the-art method. Code will be released at https://github.com/ShipingGe/MMER.

Continual Few-shot Learning with Transformer Adaptation and Knowledge Regularization

Continual few-shot learning, as a paradigm that simultaneously solves continual learning and few-shot learning, has become a challenging problem in machine learning. An eligible continual few-shot learning model is expected to distinguish all seen classes upon new categories arriving, where each category only includes very few labeled data. However, existing continual few-shot learning methods only consider the visual modality, where the distributions of new categories often indistinguishably overlap with old categories, thus resulting in the severe catastrophic forgetting problem. To tackle this problem, in this paper we study continual few-shot learning with the assistance of semantic knowledge by simultaneously taking both visual modality and semantic concepts of categories into account. We propose a Continual few-shot learning algorithm with Semantic knowledge Regularization (CoSR) for adapting to the distribution changes of visual prototypes through a Transformer-based prototype adaptation mechanism. Specifically, the original visual prototypes from the backbone are fed into the well-designed Transformer with corresponding semantic concepts, where the semantic concepts are extracted from all categories. The semantic-level regularization forces the categories with similar semantics to be closely distributed, while the opposite ones are constrained to be far away from each other. The semantic regularization improves the model’s ability to distinguish between new and old categories, thus significantly mitigating the catastrophic forgetting problem in continual few-shot learning. Extensive experiments on CIFAR100, miniImageNet, CUB200 and an industrial dataset with long-tail distribution demonstrate the advantages of our CoSR model compared with state-of-the-art methods.

Addressing Heterophily in Graph Anomaly Detection: A Perspective of Graph Spectrum

Graph anomaly detection (GAD) suffers from heterophily — abnormal nodes are sparse so that they are connected to vast normal nodes. The current solutions upon Graph Neural Networks (GNNs) blindly smooth the representation of neiboring nodes, thus undermining the discriminative information of the anomalies. To alleviate the issue, recent studies identify and discard inter-class edges through estimating and comparing the node-level representation similarity. However, the representation of a single node can be misleading when the prediction error is high, thus hindering the performance of the edge indicator.

In graph signal processing, the smoothness index is a widely adopted metric which plays the role of frequency in classical spectral analysis. Considering the ground truth Y to be a signal on graph, the smoothness index is equivalent to the value of the heterophily ratio. From this perspective, we aim to address the heterophily problem in the spectral domain. First, we point out that heterophily is positively associated with the frequency of a graph. Towards this end, we could prune inter-class edges by simply emphasizing and delineating the high-frequency components of the graph. Recall that graph Laplacian is a high-pass filter, we adopt it to measure the extent of 1-hop label changing of the center node and indicate high-frequency components. As GAD can be formulated as a semi-supervised binary classification problem, only part of the nodes are labeled. As an alternative, we use the prediction of the nodes to estimate it. Through our analysis, we show that prediction errors are less likely to affect the identification process. Extensive empirical evaluations on four benchmarks demonstrate the effectiveness of the indicator over popular homophilic, heterophilic, and tailored fraud detection methods. Our proposed indicator can effectively reduce the heterophily degree of the graph, thus boosting the overall GAD performance. Codes are open-sourced in https://github.com/blacksingular/GHRN.

CTRLStruct: Dialogue Structure Learning for Open-Domain Response Generation

Dialogue structure discovery is essential in dialogue generation. Well-structured topic flow can leverage background information and predict future topics to help generate controllable and explainable responses. However, most previous work focused on dialogue structure learning in task-oriented dialogue other than open-domain dialogue which is more complicated and challenging. In this paper, we present a new framework CTRLStruct for dialogue structure learning to effectively explore topic-level dialogue clusters as well as their transitions with unlabelled information. Precisely, dialogue utterances encoded by bi-directional Transformer are further trained through a special designed contrastive learning task to improve representation. Then we perform clustering to utterance-level representations and form topic-level clusters that can be considered as vertices in dialogue structure graph. The edges in the graph indicating transition probability between vertices are calculated by mimicking expert behavior in datasets. Finally, dialogue structure graph is integrated into dialogue model to perform controlled response generation. Experiments on two popular open-domain dialogue datasets show our model can generate more coherent responses compared to some excellent dialogue models, as well as outperform some typical sentence embedding methods in dialogue utterance representation. Code is available in GitHub1.

KAE-Informer: A Knowledge Auto-Embedding Informer for Forecasting Long-Term Workloads of Microservices

Accurately forecasting workloads in terms of throughput that is quantified as queries per second (QPS) is essential for microservices to elastically adjust their resource allocations. However, long-term QPS prediction is challenging in two aspects: 1) generality across various services with different temporal patterns, 2) characterization of intricate QPS sequences which are entangled by multiple components. In this paper, we propose a knowledge auto-embedding Informer network (KAE-Informer) for forecasting the long-term QPS sequences of microservices. By analyzing a large number of microservice traces, we discover that there are two main decomposable and predictable components in QPS sequences, namely global trend & dominant periodicity (TP) and low-frequency residual patterns with long-range dependencies. These two components are important for accurately forecasting long-term QPS. First, KAE-Informer embeds the knowledge of TP components through mathematical modeling. Second, KAE-Informer designs a convolution ProbSparse self-attention mechanism and a multi-layer event discrimination scheme to extract and embed the knowledge of local context awareness and event regression effect implied in residual components, respectively. We conduct experiments based on three real datasets including a QPS dataset collected from 40 microservices. The experiment results show that KAE-Informer achieves a reduction of MAPE, MAE and RMSE by about 16.6%, 17.6% and 23.1% respectively, compared to the state-of-the-art models.

Open-World Social Event Classification

With the rapid development of Internet and the expanding scale of social media, social event classification has attracted increasing attention. The key to social event classification is effectively leveraging the visual and textual semantics for classification. However, most of the existing approaches may suffer from the following limitations: (1) Most of them just simply concatenate the image features and text features to get the multimodal features and ignore the fine-grained semantic relationship between modalities. (2) The majority of them hold the closed-world assumption that all classes in test are already seen in training, while this assumption can be easily broken in real-world applications. In practice, new events on Internet may not belong to any existing/seen class, and therefore cannot be correctly identified by closed-world learning algorithms. To tackle these challenges, we propose an Open-World Social Event Classifier (OWSEC) model in this paper. Firstly, we design a multimodal mask transformer network to capture cross-modal semantic relations and fuse fine-grained multimodal features of social events while masking redundant information. Secondly, we design an open-world classifier and propose a cross-modal event mixture mechanism with a novel open-world classification loss to capture the potential distribution space of the unseen class. Extensive experiments on two public datasets demonstrate the superiority of our proposed OWSEC model for open-world social event classification.

KHAN: Knowledge-Aware Hierarchical Attention Networks for Accurate Political Stance Prediction

The political stance prediction for news articles has been widely studied to mitigate the echo chamber effect – people fall into their thoughts and reinforce their pre-existing beliefs. The previous works for the political stance problem focus on (1) identifying political factors that could reflect the political stance of a news article and (2) capturing those factors effectively. Despite their empirical successes, they are not sufficiently justified in terms of how effective their identified factors are in the political stance prediction. Motivated by this, in this work, we conduct a user study to investigate important factors in political stance prediction, and observe that the context and tone of a news article (implicit) and external knowledge for real-world entities appearing in the article (explicit) are important in determining its political stance. Based on this observation, we propose a novel knowledge-aware approach to political stance prediction (KHAN), employing (1) hierarchical attention networks (HAN) to learn the relationships among words and sentences in three different levels and (2) knowledge encoding (KE) to incorporate external knowledge for real-world entities into the process of political stance prediction. Also, to take into account the subtle and important difference between opposite political stances, we build two independent political knowledge graphs (KG) (i.e., KG-lib and KG-con) by ourselves and learn to fuse the different political knowledge. Through extensive evaluations on three real-world datasets, we demonstrate the superiority of KHAN in terms of (1) accuracy, (2) efficiency, and (3) effectiveness.

Improving (Dis)agreement Detection with Inductive Social Relation Information From Comment-Reply Interactions

(Dis)agreement detection aims to identify the authors’ attitudes or positions (agree, disagree, neutral) towards a specific text. It is limited for existing methods merely using textual information for identifying (dis)agreements, especially for cross-domain settings. Social relation information can play an assistant role in the (dis)agreement task besides textual information. We propose a novel method to extract such relation information from (dis)agreement data into an inductive social relation graph, merely using the comment-reply pairs without any additional platform-specific information. The inductive social relation globally considers the historical discussion and the relation between authors. Textual information based on a pre-trained language model and social relation information encoded by pre-trained RGCN are jointly considered for (dis)agreement detection. Experimental results show that our model achieves state-of-the-art performance for both the in-domain and cross-domain tasks on the benchmark – DEBAGREEMENT. We find social relations can boost the performance of the (dis)agreement detection model, especially for the long-token comment-reply pairs, demonstrating the effectiveness of the social relation graph. We also explore the effect of the knowledge graph embedding methods, the information fusing method, and the time interval in constructing the social relation graph, which shows the effectiveness of our model.

Self-training through Classifier Disagreement for Cross-Domain Opinion Target Extraction

Opinion target extraction (OTE) or aspect extraction (AE) is a fundamental task in opinion mining that aims to extract the targets (or aspects) on which opinions have been expressed. Recent work focus on cross-domain OTE, which is typically encountered in real-world scenarios, where the testing and training distributions differ. Most methods use domain adversarial neural networks that aim to reduce the domain gap between the labelled source and unlabelled target domains to improve target domain performance. However, this approach only aligns feature distributions and does not account for class-wise feature alignment, leading to suboptimal results. Semi-supervised learning (SSL) has been explored as a solution, but is limited by the quality of pseudo-labels generated by the model. Inspired by the theoretical foundations in domain adaptation [2], we propose a new SSL approach that opts for selecting target samples whose model output from a domain-specific teacher and student network disagree on the unlabelled target data, in an effort to boost the target domain performance. Extensive experiments on benchmark cross-domain OTE datasets show that this approach is effective and performs consistently well in settings with large domain shifts.

Dynalogue: A Transformer-Based Dialogue System with Dynamic Attention

Businesses face a range of cyber risks, both external threats and internal vulnerabilities that continue to evolve over time. As cyber attacks continue to increase in complexity and sophistication, more organisations will experience them. For this reason, it is important that organisations seek timely consultancy from cyber professionals so that they can respond to and recover from cyber attacks as quickly as possible. However, huge surges in cyber attacks have long left cyber professionals short of what is required to cover the security needs. This problem is getting worse when an increasing number of people choose to work from home during the pandemic because this situation usually yields extra communication cost.

In this paper, we propose to develop a cybersecurity-oriented dialogue system, called Dynalogue1, which can provide consultancy online as a cyber professional. For the first time, Dynalogue provides a promising solution to mitigate the need for cyber professionals via automatically generating problem-targeted conversions to victims of cyber attacks. In spite of many dialogue systems developed in the past, Dynalogue provides a distinct capability of handling long and complicated sentences that are common in cybersecurity-related conversations. It is challenging to have this capability because limited memory in dialogue systems can be hard to accommodate sufficient key information of long sentences. To overcome this challenge, Dynalogue utilises an attention mechanism that dynamically captures key semantics within a sentence instead of using fix window to cut off the sentence. To evaluate Dynalogue, we collect 67K real-world conversations (0.6M utterances) from Bleeping Computer2, which is one of the most popular cybersecurity consultancy websites in the world. The results suggest that Dynalogue outperforms all the existing dialogue systems with 1% ∼ 9% improvements on all different metrics. We further run Dynalogue on the public dataset WikiHow to validate its compatibility in other domains where conversations are also long and complicated. Dynalogue also outperforms all the other methods with at most 2.4% improvement.

Active Learning from the Web

Labeling data is one of the most costly processes in machine learning pipelines. Active learning is a standard approach to alleviating this problem. Pool-based active learning first builds a pool of unlabelled data and iteratively selects data to be labeled so that the total number of required labels is minimized, keeping the model performance high. Many effective criteria for choosing data from the pool have been proposed in the literature. However, how to build the pool is less explored. Specifically, most of the methods assume that a task-specific pool is given for free. In this paper, we advocate that such a task-specific pool is not always available and propose the use of a myriad of unlabelled data on the Web for the pool for which active learning is applied. As the pool is extremely large, it is likely that relevant data exist in the pool for many tasks, and we do not need to explicitly design and build the pool for each task. The challenge is that we cannot compute the acquisition scores of all data exhaustively due to the size of the pool. We propose an efficient method, Seafaring, to retrieve informative data in terms of active learning from the Web using a user-side information retrieval algorithm. In the experiments, we use the online Flickr environment as the pool for active learning. This pool contains more than ten billion images and is several orders of magnitude larger than the existing pools in the literature for active learning. We confirm that our method performs better than existing approaches of using a small unlabelled pool.

The Effect of Metadata on Scientific Literature Tagging: A Cross-Field Cross-Model Study

Due to the exponential growth of scientific publications on the Web, there is a pressing need to tag each paper with fine-grained topics so that researchers can track their interested fields of study rather than drowning in the whole literature. Scientific literature tagging is beyond a pure multi-label text classification task because papers on the Web are prevalently accompanied by metadata information such as venues, authors, and references, which may serve as additional signals to infer relevant tags. Although there have been studies making use of metadata in academic paper classification, their focus is often restricted to one or two scientific fields (e.g., computer science and biomedicine) and to one specific model. In this work, we systematically study the effect of metadata on scientific literature tagging across 19 fields. We select three representative multi-label classifiers (i.e., a bag-of-words model, a sequence-based model, and a pre-trained language model) and explore their performance change in scientific literature tagging when metadata are fed to the classifiers as additional features. We observe some ubiquitous patterns of metadata’s effects across all fields (e.g., venues are consistently beneficial to paper tagging in almost all cases), as well as some unique patterns in fields other than computer science and biomedicine, which are not explored in previous studies.

Fast and Multi-aspect Mining of Complex Time-stamped Event Streams

Given a huge, online stream of time-evolving events with multiple attributes, such as online shopping logs: (item, price, brand, time), how can we summarize large, dynamic high-order tensor streams? How can we see any hidden patterns, rules, and anomalies? Our answer is to focus on two types of patterns, i.e., “regimes” and “components”, over high-order tensor streams, for which we present an efficient and effective method, namely CubeScope. Specifically, it identifies any sudden discontinuity and recognizes distinct dynamical patterns, “regimes” (e.g., weekday/weekend/holiday patterns). In each regime, it also performs multi-way summarization for all attributes (e.g., item, price, brand, and time) and discovers hidden “components” representing latent groups (e.g., item/brand groups) and their relationship. Thanks to its concise but effective summarization, CubeScope can also detect the sudden appearance of anomalies and identify the types of anomalies that occur in practice.

Our proposed method has the following properties: (a) Effective: it captures dynamical multi-aspect patterns, i.e., regimes and components, and statistically summarizes all the events; (b) General: it is practical for successful application to data compression, pattern discovery, and anomaly detection on various types of tensor streams; (c) Scalable: our algorithm does not depend on the length of the data stream and its dimensionality. Extensive experiments on real datasets demonstrate that CubeScope finds meaningful patterns and anomalies correctly, and consistently outperforms the state-of-the-art methods as regards accuracy and execution speed.

PDSum: Prototype-driven Continuous Summarization of Evolving Multi-document Sets Stream

Summarizing text-rich documents has been long studied in the literature, but most of the existing efforts have been made to summarize a static and predefined multi-document set. With the rapid development of online platforms for generating and distributing text-rich documents, there arises an urgent need for continuously summarizing dynamically evolving multi-document sets where the composition of documents and sets is changing over time. This is especially challenging as the summarization should be not only effective in incorporating relevant, novel, and distinctive information from each concurrent multi-document set, but also efficient in serving online applications. In this work, we propose a new summarization problem, Evolving Multi-Document sets stream Summarization (EMDS), and introduce a novel unsupervised algorithm PDSum with the idea of prototype-driven continuous summarization. PDSum builds a lightweight prototype of each multi-document set and exploits it to adapt to new documents while preserving accumulated knowledge from previous documents. To update new summaries, the most representative sentences for each multi-document set are extracted by measuring their similarities to the prototypes. A thorough evaluation with real multi-document sets streams demonstrates that PDSum outperforms state-of-the-art unsupervised multi-document summarization algorithms in EMDS in terms of relevance, novelty, and distinctiveness and is also robust to various evaluation settings.

“Why is this misleading?”: Detecting News Headline Hallucinations with Explanations

Automatic headline generation enables users to comprehend ongoing news events promptly and has recently become an important task in web mining and natural language processing. With the growing need for news headline generation, we argue that the hallucination issue, namely the generated headlines being not supported by the original news stories, is a critical challenge for the deployment of this feature in web-scale systems Meanwhile, due to the infrequency of hallucination cases and the requirement of careful reading for raters to reach the correct consensus, it is difficult to acquire a large dataset for training a model to detect such hallucinations through human curation. In this work, we present a new framework named ExHalder to address this challenge for headline hallucination detection. ExHalder adapts the knowledge from public natural language inference datasets into the news domain and learns to generate natural language sentences to explain the hallucination detection results. To evaluate the model performance, we carefully collect a dataset with more than six thousand labeled ⟨ article, headline⟩ pairs. Extensive experiments on this dataset and another six public ones demonstrate that ExHalder can identify hallucinated headlines accurately and justifies its predictions with human-readable natural language explanations.

DIWIFT: Discovering Instance-wise Influential Features for Tabular Data

Tabular data is one of the most common data storage formats behind many real-world web applications such as retail, banking, and e-commerce. The success of these web applications largely depends on the ability of the employed machine learning model to accurately distinguish influential features from all the predetermined features in tabular data. Intuitively, in practical business scenarios, different instances should correspond to different sets of influential features, and the set of influential features of the same instance may vary in different scenarios. However, most existing methods focus on global feature selection assuming that all instances have the same set of influential features, and few methods considering instance-wise feature selection ignore the variability of influential features in different scenarios. In this paper, we first introduce a new perspective based on the influence function for instance-wise feature selection, and give some corresponding theoretical insights, the core of which is to use the influence function as an indicator to measure the importance of an instance-wise feature. We then propose a new solution for discovering instance-wise influential features in tabular data (DIWIFT), where a self-attention network is used as a feature selection model and the value of the corresponding influence function is used as an optimization objective to guide the model. Benefiting from the advantage of the influence function, i.e., its computation does not depend on a specific architecture and can also take into account the data distribution in different scenarios, our DIWIFT has better flexibility and robustness. Finally, we conduct extensive experiments on both synthetic and real-world datasets to validate the effectiveness of our DIWIFT.

Learning Structural Co-occurrences for Structured Web Data Extraction in Low-Resource Settings

Extracting structured information from all manner of webpages is an important problem with the potential to automate many real-world applications. Recent work has shown the effectiveness of leveraging DOM trees and pre-trained language models to describe and encode webpages. However, they typically optimize the model to learn the semantic co-occurrence of elements and labels in the same webpage, thus their effectiveness depends on sufficient labeled data, which is labor-intensive. In this paper, we further observe structural co-occurrences in different webpages of the same website: the same position in the DOM tree usually plays the same semantic role, and the DOM nodes in this position also share similar surface forms. Motivated by this, we propose a novel method, Structor, to effectively incorporate the structural co-occurrences over DOM tree and surface form into pre-trained language models. Such structural co-occurrences help the model learn the task better under low-resource settings, and we study two challenging experimental scenarios: website-level low-resource setting and webpage-level low-resource setting, to evaluate our approach. Extensive experiments on the public SWDE dataset show that Structor significantly outperforms the state-of-the-art models in both settings, and even achieves three times the performance of the strong baseline model in the case of extreme lack of training data.

Learning Disentangled Representation via Domain Adaptation for Dialogue Summarization

Dialogue summarization, which aims to generate a summary for an input dialogue, plays a vital role in intelligent dialogue systems. The end-to-end models have achieved satisfactory performance in summarization, but the success is built upon enough annotated data, which is costly to obtain, especially in the dialogue summarization. To leverage the rich external data, previous works first pre-train the model on the other domain data (e.g., the news domain), and then fine-tune it directly on the dialogue domain. The data from different domains are equally treated during the training process, while the vast differences between dialogues (usually informal, repetitive, and with multiple speakers) and conventional articles (usually formal and concise) are neglected. In this work, we propose to use a disentangled representation method to reduce the deviation between data in different domains, where the input data is disentangled into domain-invariant and domain-specific representations. The domain-invariant representation carries context information that is supposed to be the same across domains (e.g., news, dialogue) and the domain-specific representation indicates the input data belongs to a particular domain. We use adversarial learning and contrastive learning to constrain the disentangled representations to the target space. Furthermore, we propose two novel reconstruction strategies, namely backtracked and cross-track reconstructions, which aim to reduce the domain characteristics of out-of-domain data and mitigate the domain bias of the model. Experimental results on three public datasets show that our model significantly outperforms the strong baselines.

XWikiGen: Cross-lingual Summarization for Encyclopedic Text Generation in Low Resource Languages

Lack of encyclopedic text contributors, especially on Wikipedia, makes automated text generation for low resource (LR) languages a critical problem. Existing work on Wikipedia text generation has focused on English only where English reference articles are summarized to generate English Wikipedia pages. But, for low-resource languages, the scarcity of reference articles makes monolingual summarization ineffective in solving this problem. Hence, in this work, we propose XWikiGen, which is the task of cross-lingual multi-document summarization of text from multiple reference articles, written in various languages, to generate Wikipedia-style text. Accordingly, we contribute a benchmark dataset, XWikiRef, spanning ∼ 69K Wikipedia articles covering five domains and eight languages. We harness this dataset to train a two-stage system where the input is a set of citations and a section title and the output is a section-specific LR summary. The proposed system is based on a novel idea of neural unsupervised extractive summarization to coarsely identify salient information followed by a neural abstractive model to generate the section-specific text. Extensive experiments show that multi-domain training is better than the multi-lingual setup on average. We make our code and dataset publicly available1.

TMMDA: A New Token Mixup Multimodal Data Augmentation for Multimodal Sentiment Analysis

Existing methods for Multimodal Sentiment Analysis (MSA) mainly focus on integrating multimodal data effectively on limited multimodal data. Learning more informative multimodal representation often relies on large-scale labeled datasets, which are difficult and unrealistic to obtain. To learn informative multimodal representation on limited labeled datasets as more as possible, we proposed TMMDA for MSA, a new Token Mixup Multimodal Data Augmentation, which first generates new virtual modalities from the mixed token-level representation of raw modalities, and then enhances the representation of raw modalities by utilizing the representation of the generated virtual modalities. To preserve semantics during virtual modality generation, we propose a novel cross-modal token mixup strategy based on the generative adversarial network. Extensive experiments on two benchmark datasets, i.e., CMU-MOSI and CMU-MOSEI, verify the superiority of our model compared with several state-of-the-art baselines. The code is available at https://github.com/xiaobaicaihhh/TMMDA.

Node-wise Diffusion for Scalable Graph Learning

Graph Neural Networks (GNNs) have shown superior performance for semi-supervised learning of numerous web applications, such as classification on web services and pages, analysis of online social networks, and recommendation in e-commerce. The state of the art derives representations for all nodes in graphs following the same diffusion (message passing) model without discriminating their uniqueness. However, (i) labeled nodes involved in model training usually account for a small portion of graphs in the semi-supervised setting, and (ii) different nodes locate at different graph local contexts and it inevitably degrades the representation qualities if treating them undistinguishedly in diffusion.

To address the above issues, we develop NDM, a universal node-wise diffusion model, to capture the unique characteristics of each node in diffusion, by which NDM is able to yield high-quality node representations. In what follows, we customize NDM for semi-supervised learning and design the NIGCN model. In particular, NIGCN advances the efficiency significantly since it (i) produces representations for labeled nodes only and (ii) adopts well-designed neighbor sampling techniques tailored for node representation generation. Extensive experimental results on various types of web datasets, including citation, social and co-purchasing graphs, not only verify the state-of-the-art effectiveness of NIGCN but also strongly support the remarkable scalability of NIGCN. In particular, NIGCN completes representation generation and training within 10 seconds on the dataset with hundreds of millions of nodes and billions of edges, up to orders of magnitude speedups over the baselines, while achieving the highest F1-scores on classification.

BlinkViz: Fast and Scalable Approximate Visualization on Very Large Datasets using Neural-Enhanced Mixed Sum-Product Networks

Web-based online interactive visual analytics enjoys popularity in recent years. Traditionally, visualizations are produced directly from querying the underlying data. However, for a very large dataset, this way is so time-consuming that it cannot meet the low-latency requirements of interactive visual analytics. In this paper, we propose a learning-based visualization approach called BlinkViz, which uses a learned model to produce approximate visualizations by leveraging mixed sum-product networks to learn the distribution of the original data. In such a way, it makes visualization faster and more scalable by decoupling visualization and data. In addition, to improve the accuracy of approximate visualizations, we propose an enhanced model by incorporating a neural network with residual structures, which can refine prediction results, especially for visual requests with low selectivity. Extensive experiments show that BlinkViz is extremely fast even on a large dataset with hundreds of millions of data records (over 30GB), responding in sub-seconds (from 2ms to less than 500ms for different requests) while keeping a low error rate. Furthermore, our approach remains scalable on latency and memory footprint size regardless of data size.

MetaTroll: Few-shot Detection of State-Sponsored Trolls with Transformer Adapters

State-sponsored trolls are the main actors of influence campaigns on social media and automatic troll detection is important to combat misinformation at scale. Existing troll detection models are developed based on training data for known campaigns (e.g. the influence campaign by Russia’s Internet Research Agency on the 2016 US Election), and they fall short when dealing with novel campaigns with new targets. We propose MetaTroll, a text-based troll detection model based on the meta-learning framework that enables high portability and parameter-efficient adaptation to new campaigns using only a handful of labelled samples for few-shot transfer. We introduce campaign-specific transformer adapters to MetaTroll to “memorise” campaign-specific knowledge so as to tackle catastrophic forgetting, where a model “forgets” how to detect trolls from older campaigns due to continual adaptation. Our experiments demonstrate that MetaTroll substantially outperforms baselines and state-of-the-art few-shot text classification models. Lastly, we explore simple approaches to extend MetaTroll to multilingual and multimodal detection. Source code for MetaTroll is available at: https://github.com/ltian678/metatroll-code.git

EmpMFF: A Multi-factor Sequence Fusion Framework for Empathetic Response Generation

Empathy is one of the fundamental abilities of dialog systems. In order to build more intelligent dialogue systems, it’s important to learn how to demonstrate empathy toward others. Existing studies focus on identifying and leveraging the user’s coarse emotion to generate empathetic responses. However, human emotion and dialog act (e.g., intent) evolve as the talk goes along in an empathetic dialogue. This leads to the generated responses with very different intents from the human responses. As a result, empathy failure is ultimately caused. Therefore, using fine-grained emotion and intent sequential data on conversational emotions and dialog act is crucial for empathetic response generation. On the other hand, existing empathy models overvalue the empathy of responses while ignoring contextual relevance, which results in repetitive model-generated responses. To address these issues, we propose a Multi-Factor sequence Fusion framework (EmpMFF) based on conditional variational autoencoder. To generate empathetic responses, the proposed EmpMFF encodes a combination of contextual, emotion, and intent information into a continuous latent variable, which is then fed into the decoder. Experiments on the EmpatheticDialogues benchmark dataset demonstrate that EmpMFF exhibits exceptional performance in both automatic and human evaluations.

Automatic Feature Selection By One-Shot Neural Architecture Search In Recommendation Systems

Feature selection is crucial in large-scale recommendation system, which can not only reduce the computational cost, but also improve the recommendation efficiency. Most existing works rank the features and then select the top-k ones as the final feature subset. However, they assess feature importance individually and ignore the interrelationship between features. Consequently, multiple features with high relevance may be selected simultaneously, resulting in sub-optimal result. In this work, we solve this problem by proposing an AutoML-based feature selection framework that can automatically search the optimal feature subset. Specifically, we first embed the search space into a weight-sharing Supernet. Then, a two-stage neural architecture search method is employed to evaluate the feature quality. In the first stage, a well-designed sampling method considering feature convergence fairness is applied to train the Supernet. In the second stage, a reinforcement learning method is used to search for the optimal feature subset efficiently. The Experimental results on two real datasets demonstrate the superior performance of new framework over other solutions. Our proposed method obtain significant improvement with a 20% reduction in the amount of features on the Criteo. More validation experiments demonstrate the ability and robustness of the framework.

Towards Understanding Consumer Healthcare Questions on the Web with Semantically Enhanced Contrastive Learning

In recent years, seeking health information on the web has become a preferred way for healthcare consumers to support their information needs. Generally, healthcare consumers use long and detailed questions with several peripheral details to express their healthcare concerns, contributing to natural language understanding challenges. One way to address this challenge is by summarizing the questions. However, most of the existing abstractive summarization systems generate impeccably fluent yet factually incorrect summaries. In this paper, we present a semantically-enhanced contrastive learning-based framework for generating abstractive question summaries that are faithful and factually correct. We devised multiple strategies based on question semantics to generate the erroneous (negative) summaries, such that the model has the understanding of plausible and incorrect perturbations of the original summary. Our extensive experimental results on two benchmark consumer health question summarization datasets confirm the effectiveness of our proposed method by achieving state-of-the-art performance and generating factually correct and fluent summaries, as measured by human evaluation.

CEIL: A General Classification-Enhanced Iterative Learning Framework for Text Clustering

Text clustering, as one of the most fundamental challenges in unsupervised learning, aims at grouping semantically similar text segments without relying on human annotations. With the rapid development of deep learning, deep clustering has achieved significant advantages over traditional clustering methods. Despite the effectiveness, most existing deep text clustering methods rely heavily on representations pre-trained in general domains, which may not be the most suitable solution for clustering in specific target domains. To address this issue, we propose CEIL, a novel Classification-Enhanced Iterative Learning framework for short text clustering, which aims at generally promoting the clustering performance by introducing a classification objective to iteratively improve feature representations. In each iteration, we first adopt a language model to retrieve the initial text representations, from which the clustering results are collected using our proposed Category Disentangled Contrastive Clustering (CDCC) algorithm. After strict data filtering and aggregation processes, samples with clean category labels are retrieved, which serve as supervision information to update the language model with the classification objective via a prompt learning approach. Finally, the updated language model with improved representation ability is used to enhance clustering in the next iteration. Extensive experiments demonstrate that the CEIL framework significantly improves the clustering performance over iterations, and is generally effective on various clustering algorithms. Moreover, by incorporating CEIL on CDCC, we achieve the state-of-the-art clustering performance on a wide range of short text clustering benchmarks outperforming other strong baseline methods.

Modeling Dynamic Interactions over Tensor Streams

Many web applications, such as search engines and social network services, are continuously producing a huge number of events with a multi-order tensor form, {count;query, location, …, timestamp}, and so how can we discover important trends to enables us to forecast long-term future events? Can we interpret any relationships between events that determine the trends from multi-aspect perspectives? Real-world online activities can be composed of (1) many time-changing interactions that control trends, for example, competition/cooperation to gain user attention, as well as (2) seasonal patterns that covers trends. To model the shifting trends via interactions, namely dynamic interactions over tensor streams, in this paper, we propose a streaming algorithm, DISMO, that we designed to discover Dynamic Interactions and Seasonality in a Multi-Order tensor. Our approach has the following properties. (a) Interpretable: it incorporates interpretable non-linear differential equations in tensor factorization so that it can reveal latent interactive relationships and thus generate future events effectively; (b) Dynamic: it can be aware of shifting trends by switching multi-aspect factors while summarizing their characteristics incrementally; and (c) Automatic: it finds every factor automatically without losing forecasting accuracy. Extensive experiments on real datasets demonstrate that our algorithm extracts interpretable interactions between data attributes, while simultaneously providing improved forecasting accuracy and a great reduction in computational time.

Semi-supervised Adversarial Learning for Complementary Item Recommendation

Complementary item recommendations are a ubiquitous feature of modern e-commerce sites. Such recommendations are highly effective when they are based on collaborative signals like co-purchase statistics. In certain online marketplaces, however, e.g., on online auction sites, constantly new items are added to the catalog. In such cases, complementary item recommendations are often based on item side-information due to a lack of interaction data. In this work, we propose a novel approach that can leverage both item side-information and labeled complementary item pairs to generate effective complementary recommendations for cold items, i.e., for items for which no co-purchase statistics yet exist. Given that complementary items typically have to be of a different category than the seed item, we technically maintain a latent space for each item category. Simultaneously, we learn to project distributed item representations into these category spaces to determine suitable recommendations. The main learning process in our architecture utilizes labeled pairs of complementary items. In addition, we adopt ideas from Cycle Generative Adversarial Networks (CycleGAN) to leverage available item information even in case no labeled data exists for a given item and category. Experiments on three e-commerce datasets show that our method is highly effective.

Interval-censored Transformer Hawkes: Detecting Information Operations using the Reaction of Social Systems

Social media is being increasingly weaponized by state-backed actors to elicit reactions, push narratives and sway public opinion. These are known as Information Operations (IO). The covert nature of IO makes their detection difficult. This is further amplified by missing data due to the user and content removal and privacy requirements. This work advances the hypothesis that the very reactions that Information Operations seek to elicit within the target social systems can be used to detect them. We propose an Interval-censored Transformer Hawkes (IC-TH) architecture and a novel data encoding scheme to account for both observed and missing data. We derive a novel log-likelihood function that we deploy together with a contrastive learning procedure. We showcase the performance of IC-TH on three real-world Twitter datasets and two learning tasks: future popularity prediction and item category prediction. The latter is particularly significant. Using the retweeting timing and patterns solely, we can predict the category of YouTube videos, guess whether news publishers are reputable or controversial and, most importantly, identify state-backed IO agent accounts. Additional qualitative investigations uncover that the automatically discovered clusters of Russian-backed agents appear to coordinate their behavior, activating simultaneously to push specific narratives.

Constrained Subset Selection from Data Streams for Profit Maximization

The problem of constrained subset selection from a large data stream for profit maximization has many applications in web data mining and machine learning, such as social advertising, team formation and recommendation systems. Such a problem can be formulated as maximizing a regularized submodular function under certain constraints. In this paper, we consider a generalized k-system constraint, which captures various requirements in real-world applications. For this problem, we propose the first streaming algorithm with provable performance bounds, leveraging a novel multitudinous distorted filter framework. The empirical performance of our algorithm is extensively evaluated in several applications including web data mining and recommendation systems, and the experimental results demonstrate the superiorities of our algorithm in terms of both effectiveness and efficiency.

Towards Model Robustness: Generating Contextual Counterfactuals for Entities in Relation Extraction

The goal of relation extraction (RE) is to extract the semantic relations between/among entities in the text. As a fundamental task in information systems, it is crucial to ensure the robustness of RE models. Despite the high accuracy current deep neural models have achieved in RE tasks, they are easily affected by spurious correlations. One solution to this problem is to train the model with counterfactually augmented data (CAD) such that it can learn the causation rather than the confounding. However, no attempt has been made on generating counterfactuals for RE tasks.

In this paper, we formulate the problem of automatically generating CAD for RE tasks from an entity-centric viewpoint, and develop a novel approach to derive contextual counterfactuals for entities. Specifically, we exploit two elementary topological properties, i.e., the centrality and the shortest path, in syntactic and semantic dependency graphs, to first identify and then intervene on the contextual causal features for entities. We conduct a comprehensive evaluation on four RE datasets by combining our proposed approach with a variety of RE backbones. Results prove that our approach not only improves the performance of the backbones but also makes them more robust in the out-of-domain test  1.

CitationSum: Citation-aware Graph Contrastive Learning for Scientific Paper Summarization

Citation graphs can be helpful in generating high-quality summaries of scientific papers, where references of a scientific paper and their correlations can provide additional knowledge for contextualising its background and main contributions. Despite the promising contributions of citation graphs, it is still challenging to incorporate them into summarization tasks. This is due to the difficulty of accurately identifying and leveraging relevant content in references for a source paper, as well as capturing their correlations of different intensities. Existing methods either ignore references or utilize only abstracts indiscriminately from them, failing to tackle the challenge mentioned above. To fill that gap, we propose a novel citation-aware scientific paper summarization framework based on the citation graph, able to accurately locate and incorporate the salient contents from references, as well as capture varying relevance between source papers and their references. Specifically, we first build a domain-specific dataset PubMedCite with about 192K biomedical scientific papers and a large citation graph preserving 917K citation relationships between them. It is characterized by preserving the salient contents extracted from full texts of references, and the weighted correlation between the salient contents of references and the source paper. Based on it, we design a self-supervised citation-aware summarization framework (CitationSum) with graph contrastive learning, which boosts the summarization generation by efficiently fusing the salient information in references with source paper contents under the guidance of their correlations. Experimental results show that our model outperforms the state-of-the-art methods, due to efficiently leveraging the information of references and citation correlations.

SCStory: Self-supervised and Continual Online Story Discovery

We present a framework SCStory for online story discovery, that helps people digest rapidly published news article streams in real-time without human annotations. To organize news article streams into stories, existing approaches directly encode the articles and cluster them based on representation similarity. However, these methods yield noisy and inaccurate story discovery results because the generic article embeddings do not effectively reflect the story-indicative semantics in an article and cannot adapt to the rapidly evolving news article streams. SCStory employs self-supervised and continual learning with a novel idea of story-indicative adaptive modeling of news article streams. With a lightweight hierarchical embedding module that first learns sentence representations and then article representations, SCStory identifies story-relevant information of news articles and uses them to discover stories. The embedding module is continuously updated to adapt to evolving news streams with a contrastive learning objective, backed up by two unique techniques, confidence-aware memory replay and prioritized-augmentation, employed for label absence and data scarcity problems. Thorough experiments on real and the latest news data sets demonstrate that SCStory outperforms existing state-of-the-art algorithms for unsupervised online story discovery.

Set in Stone: Analysis of an Immutable Web3 Social Media Platform

There has been growing interest in the so-called “Web3” movement. This loosely refers to a mix of decentralized technologies, often underpinned by blockchain technologies. Among these, Web3 social media platforms have begun to emerge. These store all social interaction data (e.g., posts) on a public ledger, removing the need for centralized data ownership and management. But this comes at a cost, which some argue is prohibitively expensive. As an exemplar within this growing ecosytem, we explore memo.cash, a microblogging service built on the Bitcoin Cash (BCH) blockchain. We gather data for 24K users, 317K posts, 2.57M user actions, which have facilitated $6.75M worth of transactions. A particularly unique feature is that users must pay BCH tokens for each interaction (e.g., posting, following). We study how this may impact the social makeup of the platform. We therefore study memo.cash as both a social network and a transaction platform.

Show me your NFT and I tell you how it will perform: Multimodal representation learning for NFT selling price prediction

Non-Fungible Tokens (NFTs) represent deeds of ownership, based on blockchain technologies and smart contracts, of unique crypto assets on digital art forms (e.g., artworks or collectibles). In the spotlight after skyrocketing in 2021, NFTs have attracted the attention of crypto enthusiasts and investors intent on placing promising investments in this profitable market. However, the NFT financial performance prediction has not been widely explored to date.

In this work, we address the above problem based on the hypothesis that NFT images and their textual descriptions are essential proxies to predict the NFT selling prices. To this purpose, we propose MERLIN, a novel multimodal deep learning framework designed to train Transformer-based language and visual models, along with graph neural network models, on collections of NFTs’ images and texts. A key aspect in MERLIN is its independence on financial features, as it exploits only the primary data a user interested in NFT trading would like to deal with, i.e., NFT images and textual descriptions. By learning dense representations of such data, a price-category classification task is performed by MERLIN models, which can also be tuned according to user preferences in the inference phase to mimic different risk-return investment profiles. Experimental evaluation on a publicly available dataset has shown that MERLIN models achieve significant performances according to several financial assessment criteria, fostering profitable investments, and also beating baseline machine-learning classifiers based on financial features.

Catch: Collaborative Feature Set Search for Automated Feature Engineering

Feature engineering often plays a crucial role in building mining systems for tabular data, which traditionally requires experienced human experts to perform. Thanks to the rapid advances in reinforcement learning, it has offered an automated alternative, i.e. automated feature engineering (AutoFE). In this work, through scrutiny of the prior AutoFE methods, we characterize several research challenges that remained in this regime, concerning system-wide efficiency, efficacy, and practicality toward production. We then propose Catch, a full-fledged new AutoFE framework that comprehensively addresses the aforementioned challenges. The core to Catch composes a hierarchical-policy reinforcement learning scheme that manifests a collaborative feature engineering exploration and exploitation grounded on the granularity of the whole feature set. At a higher level of the hierarchy, a decision-making module controls the post-processing of the attained feature engineering transformation. We extensively experiment with Catch on 26 academic standardized tabular datasets and 9 industrialized real-world datasets. Measured by numerous metrics and analyses, Catch establishes a new state-of-the-art, from perspectives performance, latency as well as its practicality towards production. Source code1 can be found at https://github.com/1171000709/Catch.

CoTel: Ontology-Neural Co-Enhanced Text Labeling

The success of many web services relies on the large-scale domain-specific high-quality labeled dataset. Insufficient public datasets motivate us to reduce the cost of data labeling while maintaining high accuracy in support of intelligent web applications. The rule-based method and the learning-based method are common techniques for labeling. In this work, we study how to utilize the rule-based and learning-based methods for resource-effective text labeling. We propose CoTel, the first ontology-neural co-enhanced framework for text labeling. We propose critical ontology extraction in the rule-based module and ontology-enhanced loss prediction in the learning-based module. CoTel can integrate explicit labeling rules and implicit labeling models and make them help each other to improve resource efficiency in text labeling tasks. We evaluate CoTel on both public datasets and real applications with three different tasks. Compared with the baseline, CoTel can reduce the time cost by 64.75% (a 2.84× speedup) and the number of labeling by 62.07%.

Extracting Cultural Commonsense Knowledge at Scale

Structured knowledge is important for many AI applications. Commonsense knowledge, which is crucial for robust human-centric AI, is covered by a small number of structured knowledge projects. However, they lack knowledge about human traits and behaviors conditioned on socio-cultural contexts, which is crucial for situative AI. This paper presents Candle, an end-to-end methodology for extracting high-quality cultural commonsense knowledge (CCSK) at scale. Candle extracts CCSK assertions from a huge web corpus and organizes them into coherent clusters, for 3 domains of subjects (geography, religion, occupation) and several cultural facets (food, drinks, clothing, traditions, rituals, behaviors). Candle includes judicious techniques for classification-based filtering and scoring of interestingness. Experimental evaluations show the superiority of the Candle CCSK collection over prior works, and an extrinsic use case demonstrates the benefits of CCSK for the GPT-3 language model. Code and data can be accessed at https://candle.mpi-inf.mpg.de/.

Know Your Transactions: Real-time and Generic Transaction Semantic Representation on Blockchain & Web3 Ecosystem

Web3, based on blockchain technology, is the evolving next generation Internet of value. Massive active applications on Web3, e.g. DeFi and NFT, usually rely on blockchain transactions to achieve value transfer as well as complex and diverse custom logic and intentions. Various risky or illegal behaviors such as financial fraud, hacking, money laundering are currently rampant in the blockchain ecosystem, and it is thus important to understand the intent behind the pseudonymous transactions. To reveal the intent of transactions, much effort has been devoted to extracting some particular transaction semantics through specific expert experiences. However, the limitations of existing methods in terms of effectiveness and generalization make it difficult to extract diverse transaction semantics in the rapidly growing and evolving Web3 ecosystem. In this paper, we propose the Motif-based Transaction Semantics representation method (MoTS), which can capture the transaction semantic information in the real-time transaction data workflow. To the best of our knowledge, MoTS is the first general semantic extraction method in Web3 blockchain ecosystem. Experimental results show that MoTS can effectively distinguish different transaction semantics in real-time, and can be used for various downstream tasks, giving new insights to understand the Web3 blockchain ecosystem. Our codes are available at https://github.com/wuzhy1ng/MoTS.

Toward Open-domain Slot Filling via Self-supervised Co-training

Slot filling is one of the critical tasks in modern conversational systems. The majority of existing literature employs supervised learning methods, which require labeled training data for each new domain. Zero-shot learning and weak supervision approaches, among others, have shown promise as alternatives to manual labeling. Nonetheless, these learning paradigms are significantly inferior to supervised learning approaches in terms of performance. To minimize this performance gap and demonstrate the possibility of open-domain slot filling, we propose a Self-supervised Co-training framework, called , that requires zero in-domain manually labeled training examples and works in three phases. Phase one acquires two sets of complementary pseudo labels automatically. Phase two leverages the power of the pre-trained language model BERT, by adapting it for the slot filling task using these sets of pseudo labels. In phase three, we introduce a self-supervised co-training mechanism, where both models automatically select high-confidence soft labels to further improve the performance of the other in an iterative fashion. Our thorough evaluations show that outperforms state-of-the-art models by 45.57% and 37.56% on SGD and MultiWoZ datasets, respectively. Moreover, our proposed framework achieves comparable performance when compared to state-of-the-art fully supervised models.

A Multi-view Meta-learning Approach for Multi-modal Response Generation

As massive conversation examples are easily accessible on the Internet, we are now able to organize large-scale conversation corpora to build chatbots in a data-driven manner. Multi-modal social chatbots produce conversational utterances according to both textual utterances and vision signals. Due to the difficulty of bridging different modalities, the dialogue generation model of chatbots falls into local minima that only capture the mapping between textual input and textual output, as a result, it almost ignores the non-textual signals. Further, similar to the dialogue model with plain text as input and output, the generated responses from multi-modal dialogue also lack diversity and informativeness. In this paper, to address the above issues, we propose a Multi-View Meta-Learning (MultiVML) algorithm that groups samples in multiple views and customizes generation models to different groups. We employ a multi-view clustering to group the training samples so as to attend more to the unique information in non-textual modality. Tailoring different sets of model parameters for each group boosts the genereation diversity via meta-learning. We evaluate MultiVML on two variants of the OpenViDial benchmark datasets. The experiments show that our model not only better explore the information from multiple modalities, but also excels baselines in both quality and diversity.

Unsupervised Event Chain Mining from Multiple Documents

Massive and fast-evolving news articles keep emerging on the web. To effectively summarize and provide concise insights into real-world events, we propose a new event knowledge extraction task Event Chain Mining in this paper. Given multiple documents about a super event, it aims to mine a series of salient events in temporal order. For example, the event chain of super event Mexico Earthquake in 2017 is {earthquake hit Mexico, destroy houses, kill people, block roads}. This task can help readers capture the gist of texts quickly, thereby improving reading efficiency and deepening text comprehension. To address this task, we regard an event as a cluster of different mentions of similar meanings. In this way, we can identify the different expressions of events, enrich their semantic knowledge and replenish relation information among them. Taking events as the basic unit, we present a novel unsupervised framework, EMiner. Specifically, we extract event mentions from texts and merge them with similar meanings into a cluster as a single event. By jointly incorporating both content and commonsense, essential events are then selected and arranged chronologically to form an event chain. Meanwhile, we annotate a multi-document benchmark to build a comprehensive testbed for the proposed task. Extensive experiments are conducted to verify the effectiveness of EMiner in terms of both automatic and human evaluations.

Interactive Log Parsing via Light-weight User Feedback

Template mining is one of the foundational tasks to support log analysis, which supports the diagnosis and troubleshooting of large scale Web applications. This paper develops a human-in-the-loop template mining framework to support interactive log analysis, which is highly desirable in real-world diagnosis or troubleshooting of Web applications but yet previous template mining algorithms fail to support it. We formulate three types of light-weight user feedback and based on them we design three atomic human-in-the-loop template mining algorithms. We derive mild conditions under which the outputs of our proposed algorithms are provably correct. We also derive upper bounds on the computational complexity and query complexity of each algorithm. We demonstrate the versatility of our proposed algorithms by combining them to improve the template mining accuracy of five representative algorithms over sixteen widely used benchmark datasets.

SESSION: Security, Privacy & Trust

Measuring and Evading Turkmenistan’s Internet Censorship: A Case Study in Large-Scale Measurements of a Low-Penetration Country

Since 2006, Turkmenistan has been listed as one of the few Internet enemies by Reporters without Borders due to its extensively censored Internet and strictly regulated information control policies. Existing reports of filtering in Turkmenistan rely on a handful of vantage points or test a small number of websites. Yet, the country’s poor Internet adoption rates and small population can make more comprehensive measurement challenging. With a population of only six million people and an Internet penetration rate of only 38%, it is challenging to either recruit in-country volunteers or obtain vantage points to conduct remote network measurements at scale.

We present the largest measurement study to date of Turkmenistan’s Web censorship. To do so, we developed TMC, which tests the blocking status of millions of domains across the three foundational protocols of the Web (DNS, HTTP, and HTTPS). Importantly, TMC does not require access to vantage points in the country. We apply TMC to 15.5M domains, our results reveal that Turkmenistan censors more than 122K domains, using different blocklists for each protocol. We also reverse-engineer these censored domains, identifying 6K over-blocking rules causing incidental filtering of more than 5.4M domains. Finally, we use , an open-source censorship evasion tool, to discover five new censorship evasion strategies that can defeat Turkmenistan’s censorship at both transport and application layers. We will publicly release both the data collected by TMC and the code for censorship evasion.

Provenance of Training without Training Data: Towards Privacy-Preserving DNN Model Ownership Verification

In the era of deep learning, it is critical to protect the intellectual property of high-performance deep neural network (DNN) models. Existing proposals, however, are subject to adversarial ownership forgery (e.g., methods based on watermarks or fingerprints) or require full access to the original training dataset for ownership verification (e.g., methods requiring the replay of the learning process). In this paper, we propose a novel Provenance of Training (PoT) scheme, the first empirical study towards verifying DNN model ownership without accessing any original dataset while being robust against existing attacks. At its core, PoT relies on a coherent model chain built from the intermediate checkpoints saved during model training to serve as the ownership certificate. Through an in-depth analysis of model training, we propose six key properties that a legitimate model chain shall naturally hold. In contrast, it is difficult for the adversary to forge a model chain that satisfies these properties simultaneously without performing actual training. We systematically analyze PoT’s robustness against various possible attacks, including the adaptive attacks that are designed given the full knowledge of PoT’s design, and further perform extensive empirical experiments to demonstrate our security analysis.

Efficient and Low Overhead Website Fingerprinting Attacks and Defenses based on TCP/IP Traffic

Website fingerprinting attack is an extensively studied technique used in a web browser to analyze traffic patterns and thus infer confidential information about users. Several website fingerprinting attacks based on machine learning and deep learning tend to use the most typical features to achieve a satisfactory performance of attacking rate. However, these attacks suffer from several practical implementation factors, such as a skillfully pre-processing step or a clean dataset. To defend against such attacks, random packet defense (RPD) with a high cost of excessive network overhead is usually applied. In this work, we first propose a practical filter-assisted attack against RPD, which can filter out the injected noises using the statistical characteristics of TCP/IP traffic. Then, we propose a list-assisted defensive mechanism to defend the proposed attack method. To achieve a configurable trade-off between the defense and the network overhead, we further improve the list-based defense by a traffic splitting mechanism, which can combat the mentioned attacks as well as save a considerable amount of network overhead. In the experiments, we collect real-life traffic patterns using three mainstream browsers, i.e., Microsoft Edge, Google Chrome, and Mozilla Firefox, and extensive results conducted on the closed and open-world datasets show the effectiveness of the proposed algorithms in terms of defense accuracy and network efficiency.

MaSS: Model-agnostic, Semantic and Stealthy Data Poisoning Attack on Knowledge Graph Embedding

Open-source knowledge graphs are attracting increasing attention. Nevertheless, the openness also raises the concern of data poisoning attacks, that is, the attacker could submit malicious facts to bias the prediction of knowledge graph embedding (KGE) models. Existing studies on such attacks adopt a clear-box setting and neglect the semantic information of the generated facts, making them fail to attack in real-world scenarios. In this work, we consider a more rigorous setting and propose a model-agnostic, semantic, and stealthy data poisoning attack on KGE models from a practical perspective. The main design of our work is to inject indicative paths to make the infected model predict certain malicious facts. With the aid of the proposed opaque-box path injection theory, we theoretically reveal that the attack success rate under the opaque-box setting is determined by the plausibility of triplets on the indicative path. Based on this, we develop a novel and efficient algorithm to search paths that maximize the attack goal, satisfy certain semantic constraints, and preserve certain stealthiness, i.e., the normal functionality of the target KGE will not be influenced although it predicts wrong facts given certain queries. Through extensive evaluation of benchmark datasets and 6 typical knowledge graph embedding models as the victims, we validate the effectiveness in terms of attack success rate (ASR) under opaque-box setting and stealthiness. For example, on FB15k-237, our attack achieves a ASR on DeepPath, with an average ASR over when attacking various KGE models under the opaque-box setting.

Curriculum Graph Poisoning

Despite the success of graph neural networks (GNNs) over the Web in recent years, the typical transductive learning setting for node classification requires GNNs to be retrained frequently, making them vulnerable to poisoning attacks by corrupting the training graph. Poisoning attacks on graphs are, however, non-trivial as the attack space is potentially large, and the discrete graph structure makes the poisoning function non-differentiable. In this paper, we revisit the bi-level optimization problem in graph poisoning and propose a novel graph poisoning method, termed Curriculum Graph Poisoning (CuGPo), inspired by curriculum learning. In contrast to other poisoning attacks that use heuristics or directly optimize the graph, our method learns to generate poisoned graphs from basic adversarial knowledge first and advanced knowledge later. Specifically, for the outer optimization, we utilize the slightly perturbed graphs which represent the easy poisoning task at the beginning, and then enlarge the attack space until the final; for the inner optimization, we firstly exploit the knowledge from the clean graph and then adapt quickly to perturbed graphs to obtain the adversarial knowledge. Extensive experiments demonstrate that CuGPo achieves state-of-the-art performance in graph poisoning attacks.

On How Zero-Knowledge Proof Blockchain Mixers Improve, and Worsen User Privacy

Zero-knowledge proof (ZKP) mixers are one of the most widely-used blockchain privacy solutions, operating on top of smart contract-enabled blockchains. We find that ZKP mixers are tightly intertwined with the growing number of Decentralized Finance (DeFi) attacks and Blockchain Extractable Value (BEV) extractions. Through coin flow tracing, we discover that 205 blockchain attackers and 2, 595 BEV extractors leverage mixers as their source of funds, while depositing a total attack revenue of 412.87M USD. Moreover, the US OFAC sanctions against the largest ZKP mixer, Tornado.Cash, have reduced the mixer’s daily deposits by more than .

Further, ZKP mixers advertise their level of privacy through a so-called anonymity set size, which similarly to k-anonymity allows a user to hide among a set of k other users. Through empirical measurements, we, however, find that these anonymity set claims are mostly inaccurate. For the most popular mixers on Ethereum (ETH) and Binance Smart Chain (BSC), we show how to reduce the anonymity set size on average by and respectively. Our empirical evidence is also the first to suggest a differing privacy-predilection of users on ETH and BSC.

State-of-the-art ZKP mixers are moreover interwoven with the DeFi ecosystem by offering anonymity mining (AM) incentives, i.e., users receive monetary rewards for mixing coins. However, contrary to the claims of related work, we find that AM does not necessarily improve the quality of a mixer’s anonymity set. Our findings indicate that AM attracts privacy-ignorant users, who then do not contribute to improving the privacy of other mixer users.

Transferring Audio Deepfake Detection Capability across Languages

The proliferation of deepfake content has motivated a surge of detection studies. However, existing detection methods in the audio area exclusively work in English, and there is a lack of data resources in other languages. Cross-lingual deepfake detection, a critical but rarely explored area, urges more study. This paper conducts the first comprehensive study on the cross-lingual perspective of deepfake detection. We observe that English data enriched in deepfake algorithms can teach a detector the knowledge of various spoofing artifacts, contributing to performing detection across language domains. Based on the observation, we first construct a first-of-its-kind cross-lingual evaluation dataset including heterogeneous spoofed speech uttered in the two most widely spoken languages, then explored domain adaptation (DA) techniques to transfer the artifacts detection capability and propose effective and practical DA strategies fitting the cross-lingual scenario. Our adversarial-based DA paradigm teaches the model to learn real/fake knowledge while losing language dependency. Extensive experiments over 137-hour audio clips validate the adapted models can detect fake audio generated by unseen algorithms in the new domain.

NetGuard: Protecting Commercial Web APIs from Model Inversion Attacks using GAN-generated Fake Samples

Recently more and more cloud service providers (e.g., Microsoft, Google, and Amazon) have commercialized their well-trained deep learning models by providing limited access via web API interfaces. However, it is shown that these APIs are susceptible to model inversion attacks, where attackers can recover the training data with high fidelity, which may cause serious privacy leakage.Existing defenses against model inversion attacks, however, hinder the model performance and are ineffective for more advanced attacks, e.g., Mirror [4]. In this paper, we proposed NetGuard, a novel utility-aware defense methodology against model inversion attacks (MIAs). Unlike previous works that perturb prediction outputs of the victim model, we propose to mislead the MIA effort by inserting engineered fake samples during the training process. A generative adversarial network (GAN) is carefully built to construct fake training samples to mislead the attack model without degrading the performance of the victim model. Besides, we adopt continual learning to further improve the utility of the victim model. Extensive experiments on CelebA, VGG-Face, and VGG-Face2 datasets show that NetGuard is superior to existing defenses, including DP [37] and Ad-mi [32] on state-of-the-art model inversion attacks, i.e., DMI [8], Mirror [4], Privacy [12], and Alignment [34].

Web Photo Source Identification based on Neural Enhanced Camera Fingerprint

With the growing popularity of smartphone photography in recent years, web photos play an increasingly important role in all walks of life. Source camera identification of web photos aims to establish a reliable linkage from the captured images to their source cameras, and has a broad range of applications, such as image copyright protection, user authentication, investigated evidence verification, etc. This paper presents an innovative and practical source identification framework that employs neural-network enhanced sensor pattern noise to trace back web photos efficiently while ensuring security. Our proposed framework consists of three main stages: initial device fingerprint registration, fingerprint extraction and cryptographic connection establishment while taking photos, and connection verification between photos and source devices. By incorporating metric learning and frequency consistency into the deep network design, our proposed fingerprint extraction algorithm achieves state-of-the-art performance on modern smartphone photos for reliable source identification. Meanwhile, we also propose several optimization sub-modules to prevent fingerprint leakage and improve accuracy and efficiency. Finally for practical system design, two cryptographic schemes are introduced to reliably identify the correlation between registered fingerprint and verified photo fingerprint, i.e. fuzzy extractor and zero-knowledge proof (ZKP). The codes for fingerprint extraction network and benchmark dataset with modern smartphone cameras photos are all publicly available at https://github.com/PhotoNecf/PhotoNecf 1.

TFE-GNN: A Temporal Fusion Encoder Using Graph Neural Networks for Fine-grained Encrypted Traffic Classification

Encrypted traffic classification is receiving widespread attention from researchers and industrial companies. However, the existing methods only extract flow-level features, failing to handle short flows because of unreliable statistical properties, or treat the header and payload equally, failing to mine the potential correlation between bytes. Therefore, in this paper, we propose a byte-level traffic graph construction approach based on point-wise mutual information (PMI), and a model named Temporal Fusion Encoder using Graph Neural Networks (TFE-GNN) for feature extraction. In particular, we design a dual embedding layer, a GNN-based traffic graph encoder as well as a cross-gated feature fusion mechanism, which can first embed the header and payload bytes separately and then fuses them together to obtain a stronger feature representation. The experimental results on two real datasets demonstrate that TFE-GNN outperforms multiple state-of-the-art methods in fine-grained encrypted traffic classification tasks.

Time-manipulation Attack: Breaking Fairness against Proof of Authority Aura

As blockchain-based commercial projects and startups flourish, efficiency becomes one of the critical metrics in designing blockchain systems. Due to its high efficiency, Proof of Authority (PoA) Aura has become one of the most widely adopted consensus solutions for blockchains. Our research finds over 4,000 projects have used Aura and its variants. In this paper, we provide a rigorous analysis of Aura. We propose three types of time-manipulation attacks, where a malicious leader simply needs to modify the timestamp in its proposed block or delay it to extract extra benefits. These attacks can easily break the legal leader election, thus directly harming the fairness of the block proposal. We apply our attacks to a mature Aura project called OpenEthereum. By repeatedly conducting our attacks1 over 15 days, we find that an adversary can gain on average 200% mining rewards of their fair shares. Furthermore, such attacks can even indirectly break the finality of blocks and the safety of the system. Based on the deployment of Aura as of September 2022, the potentially affected market cap is up to 2.13 billion USD. As a by-product, we further discuss solutions to mitigate such issues and report our observations to official teams.

Meteor: Improved Secure 3-Party Neural Network Inference with Reducing Online Communication Costs

Secure neural network inference has been a promising solution to private Deep-Learning-as-a-Service, which enables the service provider and user to execute neural network inference without revealing their private inputs. However, the expensive overhead of current schemes is still an obstacle when applied in real applications. In this work, we present Meteor, an online communication-efficient and fast secure 3-party computation neural network inference system aginst semi-honest adversary in honest-majority. The main contributions of Meteor are two-fold: i) We propose a new and improved 3-party secret sharing scheme stemming from the linearity of replicated secret sharing, and design efficient protocols for the basic cryptographic primitives, including linear operations, multiplication, most significant bit extraction, and multiplexer. ii) Furthermore, we build efficient and secure blocks for the widely used neural network operators such as Matrix Multiplication, ReLU, and Maxpool, along with exploiting several specific optimizations for better efficiency. Our total communication with the setup phase is a little larger than SecureNN (PoPETs’19) and Falcon (PoPETs’21), two state-of-the-art solutions, but the gap is not significant when the online phase must be optimized as a priority. Using Meteor, we perform extensive evaluations on various neural networks. Compared to SecureNN and Falcon, we reduce the online communication costs by up to 25.6 × and 1.5 ×, and improve the running-time by at most 9.8 × (resp. 8.1 ×) and 1.5 × (resp. 2.1 ×) in LAN (resp. WAN) for the online inference.

Do NFTs’ Owners Really Possess their Assets? A First Look at the NFT-to-Asset Connection Fragility

Most NFTs (Non-Fungible Tokens) use multi-hop URLs to address the off-chain assets due to the costly on-chain storage, but the path from NFTs to the underlying assets is fraught with instability, which may degrade its value. Hence, this paper aims to answer the question: Is the NFT-to-Asset connection fragile? This paper makes a first step towards this end by characterizing NFT-to-Asset connections of 12,353 Ethereum NFT Contracts (6,234,141 NFTs in total) from three perspectives, storage, accessibility, and duplication. In order to overcome challenges of affecting the measurement accuracy, e.g., IPFS instability and the changing availability of both IPFS and servers’ data, we propose to leverage multiple gateways to enlarge the data coverage and extend a longer measurement period with non-trivial efforts. Results of our extensive study show that such connection is very fragile in practice. The loss, unavailability, or duplication of off-chain assets could render the value of NFTs worthless. For instance, we find that assets of 25.24% of Ethereum NFT contracts are not accessible, and 21.48% of Ethereum NFT contracts include duplicated assets. Our work sheds light on the fragility along the NFT-to-Asset connection, which could help the NFT community to better enhance the trust of off-chain assets.

Preserving Missing Data Distribution in Synthetic Data

Data from Web artifacts and from the Web is often sensitive and cannot be directly shared for data analysis. Therefore, synthetic data generated from the real data is increasingly used as a privacy-preserving substitute. In many cases, real data from the web has missing values where the missingness itself possesses important informational content, which domain experts leverage to improve their analysis. However, this information content is lost if either imputation or deletion is used before synthetic data generation. In this paper, we propose several methods to generate synthetic data that preserve both the observable and the missing data distributions. An extensive empirical evaluation over a range of carefully fabricated and real world datasets demonstrates the effectiveness of our approach.

Ginver: Generative Model Inversion Attacks Against Collaborative Inference

Deep Learning (DL) has been widely adopted in almost all domains, from threat recognition to medical diagnosis. Albeit its supreme model accuracy, DL imposes a heavy burden on devices as it incurs overwhelming system overhead to execute DL models, especially on Internet-of-Things (IoT) and edge devices. Collaborative inference is a promising approach to supporting DL models, by which the data owner (the victim) runs the first layers of the model on her local device and then a cloud provider (the adversary) runs the remaining layers of the model. Compared to offloading the entire model to the cloud, the collaborative inference approach is more data privacy-preserving as the owner’s model input is not exposed to outsiders. However, we show in this paper that the adversary can restore the victim’s model input by exploiting the output of the victim’s local model. Our attack is dubbed Ginver 1: Generative model inversion attacks against collaborative inference. Once trained, Ginver can infer the victim’s unseen model inputs without remaking the inversion attack model and thus has the generative capability. We extensively evaluate Ginver under different settings (e.g., white-box and black-box of the victim’s local model) and applications (e.g., CIFAR10 and FaceScrub datasets). The experimental results show that Ginver recovers high-quality images from the victims.

The Hitchhiker’s Guide to Facebook Web Tracking with Invisible Pixels and Click IDs

Over the past years, advertisement companies have used various tracking methods to persistently track users across the web. Such tracking methods usually include first and third-party cookies, cookie synchronization, as well as a variety of fingerprinting mechanisms. Facebook (FB) (now Meta) recently introduced a new tagging mechanism that attaches a one-time tag as a URL parameter (namely FBCLID) on outgoing links to other websites. Although such a tag does not seem to have enough information to persistently track users, we demonstrate that despite its ephemeral nature, when combined with FB Pixel, it can aid in persistently monitoring user browsing behavior across i) different websites, ii) different actions on each website, iii) time, i.e., both in the past as well as in the future. We refer to this online monitoring of users as FB web tracking.

We find that FB Pixel tracks a wide range of user activities on websites with alarming detail, especially on websites classified as sensitive categories under GDPR. Also, we show how the FBCLID tag can be used to match, and thus de-anonymize, activities of online users performed in the distant past (even before those users had a FB account) tracked by FB Pixel. In fact, by combining this tag with cookies that have rolling expiration dates, FB can also keep track of users’ browsing activities in the future as well. Our experimental results suggest that 23% of the 10k most popular websites have adopted this technology, and can contribute to this activity tracking on the web. Furthermore, our longitudinal study shows that this type of user activity tracking can go as far back as 2015.). Simply said, if a user creates for the first time a FB account today, FB could, under some conditions, match their anonymously collected past web browsing activity to their newly created FB profile, from as far back as 2015 and continue tracking their activity in the future.

All Your Shops Are Belong to Us: Security Weaknesses in E-commerce Platforms

Software as a Service (SaaS) e-commerce platforms for merchants allow individual business owners to set up their online stores almost instantly. Prior work has shown that the checkout flows and payment integration of some e-commerce applications are vulnerable to logic bugs with serious financial consequences, e.g., allowing “shopping for free”. Apart from checkout and payment integration, vulnerabilities in other e-commerce operations have remained largely unexplored, even though they can have far more serious consequences, e.g., enabling “store takeover”. In this work, we design and implement a security evaluation framework to uncover security vulnerabilities in e-commerce operations beyond checkout/payment integration. We use this framework to analyze 32 representative e-commerce platforms, including web services of 24 commercial SaaS platforms and 15 associated Android apps, and 8 open source platforms; these platforms host over 10 million stores as approximated through Google dorks. We uncover several new vulnerabilities with serious consequences, e.g., allowing an attacker to take over all stores under a platform, and listing illegal products at a victim’s store—in addition to “shopping for free” bugs, without exploiting the checkout/payment process. We found 12 platforms vulnerable to store takeover (affecting 41000+ stores) and 6 platforms vulnerable to shopping for free (affecting 19000+ stores, approximated via Google dorks on Oct. 8, 2022). We have responsibly disclosed the vulnerabilities to all affected parties, and requested four CVEs (three assigned, and one is pending review).

An Empirical Study of the Usage of Checksums for Web Downloads

Checksums, typically provided on webpages and generated from cryptographic hash functions (e.g., MD5, SHA256) or signature schemes (e.g., PGP), are commonly used on websites to enable users to verify that the files they download have not been tampered with when stored on possibly untrusted servers. In this paper, we elucidate the current practices regarding the usage of checksums for web downloads (hash functions used, visibility and validity of checksums, type of websites and files, etc.), as this has been mostly overlooked so far. Using a snowball-sampling strategy for the 200000 most popular domains of the Web, we first crawled a dataset of  8.5M webpages, from which we built, through an active-learning approach, a unique dataset of 277 diverse webpages that contain checksums. Our analysis of these webpages reveals interesting findings about the usage of checksums. For instance, it shows that checksums are used mostly to verify program files, that weak hash functions are frequently used, and that a non-negligible proportion of the checksums provided on webpages do not match that of their associated files. Finally, we complement our analysis with a survey of the webmasters of the considered webpages (N = 26), thus shedding light on the reasons behind the checksum-related choices they make.

Not Seen, Not Heard in the Digital World! Measuring Privacy Practices in Children’s Apps

The digital age has brought a world of opportunity to children. Connectivity can be a game-changer for some of the world’s most marginalized children. However, while legislatures around the world have enacted regulations to protect children’s online privacy, and app stores have instituted various protections, privacy in mobile apps remains a growing concern for parents and wider society. In this paper, we explore the potential privacy issues and threats that exist in these apps. We investigate 20195 mobile apps from the Google Play store that are designed particularly for children (Family apps) or include children in their target user groups (Normal apps). Using both static and dynamic analysis, we find that 4.47% of Family apps request location permissions, even though collecting location information from children is forbidden by the Play store, and 81.25% of Family apps use trackers (which are not allowed in children’s apps). Even major developers with 40+ kids apps on the Play store use ad trackers. Furthermore, we find that most permission request notifications are not well designed for children, and 19.25% apps have inconsistent content age ratings across the different protection authorities. Our findings suggest that, despite significant attention to children’s privacy, a large gap between regulatory provisions, app store policies, and actual development practices exist. Our research sheds light for government policymakers, app stores, and developers.

Automatic Discovery of Emerging Browser Fingerprinting Techniques

With the progression of modern browsers, online tracking has become the most concerning issue for preserving privacy on the web. As major browser vendors plan to or already ban third-party cookies, trackers have to shift towards browser fingerprinting by incorporating novel browser APIs into their tracking arsenal. Understanding how new browser APIs are abused in browser fingerprinting techniques is a significant step toward ensuring protection from online tracking.

In this paper, we propose a novel hybrid system, named BFAD, that automatically identifies previously unknown browser fingerprinting APIs in the wild. The system combines dynamic and static analysis to accurately reveal browser API usage and automatically infer browser fingerprinting behavior. Based on the observation that a browser fingerprint is constructed by pulling information from multiple APIs, we leverage dynamic analysis and a locality-based algorithm to discover all involved APIs and static analysis on the dataflow of fingerprinting information to accurately associate them together. Our system discovers 231 fingerprinting APIs in Alexa top 10K domains, starting with only 35 commonly known fingerprinting APIs and 17 data transmission APIs. Out of 231 APIs, 161 of them are not identified by state-of-the-art detection systems. Since our approach is fully automated, we repeat our experiments 11 months later and discover 18 new fingerprinting APIs that were not discovered in our previous experiment. We present with case studies the fingerprinting ability of a total of 249 detected APIs.

BERT4ETH: A Pre-trained Transformer for Ethereum Fraud Detection

As various forms of fraud proliferate on Ethereum, it is imperative to safeguard against these malicious activities to protect susceptible users from being victimized. While current studies solely rely on graph-based fraud detection approaches, it is argued that they may not be well-suited for dealing with highly repetitive, skew-distributed and heterogeneous Ethereum transactions. To address these challenges, we propose BERT4ETH, a universal pre-trained Transformer encoder that serves as an account representation extractor for detecting various fraud behaviors on Ethereum. BERT4ETH features the superior modeling capability of Transformer to capture the dynamic sequential patterns inherent in Ethereum transactions, and addresses the challenges of pre-training a BERT model for Ethereum with three practical and effective strategies, namely repetitiveness reduction, skew alleviation and heterogeneity modeling. Our empirical evaluation demonstrates that BERT4ETH outperforms state-of-the-art methods with significant enhancements in terms of the phishing account detection and de-anonymization tasks. The code for BERT4ETH is available at: https://github.com/git-disl/BERT4ETH.

Training-free Lexical Backdoor Attacks on Language Models

Large-scale language models have achieved tremendous success across various natural language processing (NLP) applications. Nevertheless, language models are vulnerable to backdoor attacks, which inject stealthy triggers into models for steering them to undesirable behaviors. Most existing backdoor attacks, such as data poisoning, require further (re)training or fine-tuning language models to learn the intended backdoor patterns. The additional training process however diminishes the stealthiness of the attacks, as training a language model usually requires long optimization time, a massive amount of data, and considerable modifications to the model parameters.

In this work, we propose Training-Free Lexical Backdoor Attack (TFLexAttack) as the first training-free backdoor attack on language models. Our attack is achieved by injecting lexical triggers into the tokenizer of a language model via manipulating its embedding dictionary using carefully designed rules. These rules are explainable to human developers which inspires attacks from a wider range of hackers. The sparse manipulation of the dictionary also habilitates the stealthiness of our attack. We conduct extensive experiments on three dominant NLP tasks based on nine language models to demonstrate the effectiveness and universality of our attack. The code of this work is available at https://github.com/Jinxhy/TFLexAttack.

The Benefits of Vulnerability Discovery and Bug Bounty Programs: Case Studies of Chromium and Firefox

Recently, bug-bounty programs have gained popularity and become a significant part of the security culture of many organizations. Bug-bounty programs enable organizations to enhance their security posture by harnessing the diverse expertise of crowds of external security experts (i.e., bug hunters). Nonetheless, quantifying the benefits of bug-bounty programs remains elusive, which presents a significant challenge for managing them. Previous studies focused on measuring their benefits in terms of the number of vulnerabilities reported or based on the properties of the reported vulnerabilities, such as severity or exploitability. However, beyond these inherent properties, the value of a report also depends on the probability that the vulnerability would be discovered by a threat actor before an internal expert could discover and patch it. In this paper, we present a data-driven study of the Chromium and Firefox vulnerability-reward programs. First, we estimate the difficulty of discovering a vulnerability using the probability of rediscovery as a novel metric. Our findings show that vulnerability discovery and patching provide clear benefits by making it difficult for threat actors to find vulnerabilities; however, we also identify opportunities for improvement, such as incentivizing bug hunters to focus more on development releases. Second, we compare the types of vulnerabilities that are discovered internally vs. externally and those that are exploited by threat actors. We observe significant differences between vulnerabilities found by external bug hunters, internal security teams, and external threat actors, which indicates that bug-bounty programs provide an important benefit by complementing the expertise of internal teams, but also that external hunters should be incentivized more to focus on the types of vulnerabilities that are likely to be exploited by threat actors.

Cross-Modality Mutual Learning for Enhancing Smart Contract Vulnerability Detection on Bytecode

Over the past couple of years, smart contracts have been plagued by multifarious vulnerabilities, which have led to catastrophic financial losses. Their security issues, therefore, have drawn intense attention. As countermeasures, a family of tools has been developed to identify vulnerabilities in smart contracts at the source-code level. Unfortunately, only a small fraction of smart contracts is currently open-sourced. Another spectrum of work is presented to deal with pure bytecode, but most such efforts still suffer from relatively low performance due to the inherent difficulty in restoring abundant semantics in the source code from the bytecode.

This paper proposes a novel cross-modality mutual learning framework for enhancing smart contract vulnerability detection on bytecode. Specifically, we engage in two networks, a student network as the primary network and a teacher network as the auxiliary network. takes two modalities, i.e., source code and its corresponding bytecode as inputs, while is fed with only bytecode. By learning from , is trained to infer the missed source code embeddings and combine both modalities to approach precise vulnerability detection. To further facilitate mutual learning between and , we present a cross-modality mutual learning loss and two transfer losses. As a side contribution, we construct and release a labeled smart contract dataset that concerns four types of common vulnerabilities. Experimental results show that our method significantly surpasses state-of-the-art approaches.

Net-track: Generic Web Tracking Detection Using Packet Metadata

While third-party trackers breach users’ privacy by compiling large amounts of personal data through web tracking techniques, combating these trackers is still left at the hand of each user. Although network operators may attempt a network-wide detection of trackers through inspecting all web traffic inside the network, their methods are not only privacy-intrusive but of limited accuracy as these are susceptible to domain changes or ineffective against encrypted traffic. To this end, in this paper, we propose Net-track, a novel approach to managing a secure web environment through platform-independent, encryption-agnostic detection of trackers. Utilizing only side-channel data from network traffic that are still available when encrypted, Net-track accurately detects trackers network-wide, irrespective of user’s browsers or devices without looking into packet payloads or resources fetched from the web server. This prevents user data from leaking to tracking servers in a privacy-preserving manner. By measuring statistics from traffic traces and their similarities, we show distinctions between benign traffic and tracker traffic in their traffic patterns and build Net-track based on the features that fully capture trackers’ distinctive characteristics. Evaluation results show that Net-track is able to detect trackers with 94.02% accuracy and can even discover new trackers yet unrecognized by existing filter lists. Furthermore, Net-track shows its potential for real-time detection, maintaining its performance when using only a portion of each traffic trace.

The Chameleon on the Web: an Empirical Study of the Insidious Proactive Web Defacements

Web defacement is one of the major promotional channels for online underground economies. It regularly compromises benign websites and injects fraudulent content to promote illicit goods and services. It inflicts significant harm to websites’ reputations and revenues and may lead to legal ramifications. In this paper, we uncover proactive web defacements, where the involved web pages (i.e., landing pages) proactively deface themselves within browsers using JavaScript (i.e., control scripts). Proactive web defacements have not yet received attention from research communities, anti-hacking organizations, or law-enforcement officials. To detect proactive web defacements, we designed a practical tool, PACTOR. It runs in the browser and intercepts JavaScript API calls that manipulate web page content. It takes snapshots of the rendered HTML source code immediately before and after the intercepted API calls and detects proactive web defacements by visually comparing every two consecutive snapshots. Our two-month empirical study, using PACTOR, on 2,454 incidents of proactive web defacements shows that they can evade existing URL safety-checking tools and effectively promote the ranking of their landing pages using legitimate content/keywords. We also investigated the vendor network of proactive web defacements and reported all the involved domains to law-enforcement officials and URL-safety checking tools.

Shield: Secure Allegation Escrow System with Stronger Guarantees

The rising issues of harassment, exploitation, corruption and other forms of abuse have led victims to seek comfort by acting in unison against common perpetrators. This is corroborated by the widespread #MeToo movement, which was explicitly against sexual harassment. Installation of escrow systems has allowed victims to report such incidents. The escrows are responsible for identifying the perpetrator and taking the necessary action to bring justice to all its victims. However, users hesitate to participate in these systems due to the fear of such sensitive reports being leaked to perpetrators, who may further misuse them. Thus, to increase trust in the system, cryptographic solutions are being designed to realize web-based secure allegation escrow (SAE) systems.

While the work of Arun et al. (NDSS’20) presents the state-of-the-art solution, we identify attacks that can leak sensitive information and compromise victim privacy. We also report issues present in prior works that were left unidentified. Having identified the attacks and issues in all prior works, we put forth an SAE system that overcomes these while retaining all the existing salient features. The cryptographic technique of secure multiparty computation (MPC) serves as the primary underlying tool in designing our system. At the heart of our system lies a new duplicity check protocol and an improved matching protocol. We also provide essential features such as allegation modification and deletion, which were absent in the state of the art. To demonstrate feasibility, we benchmark the proposed system with state-of-the-art MPC protocols and report the cost of processing an allegation. Different settings that affect system performance are analyzed, and the reported values showcase the practicality of our solution.

Unnoticeable Backdoor Attacks on Graph Neural Networks

Graph Neural Networks (GNNs) have achieved promising results in various tasks such as node classification and graph classification. Recent studies find that GNNs are vulnerable to adversarial attacks. However, effective backdoor attacks on graphs are still an open problem. In particular, backdoor attack poisons the graph by attaching triggers and the target class label to a set of nodes in the training graph. The backdoored GNNs trained on the poisoned graph will then be misled to predict test nodes to target class once attached with triggers. Though there are some initial efforts in graph backdoor attacks, our empirical analysis shows that they may require a large attack budget for effective backdoor attacks and the injected triggers can be easily detected and pruned. Therefore, in this paper, we study a novel problem of unnoticeable graph backdoor attacks with limited attack budget. To fully utilize the attack budget, we propose to deliberately select the nodes to inject triggers and target class labels in the poisoning phase. An adaptive trigger generator is deployed to obtain effective triggers that are difficult to be noticed. Extensive experiments on real-world datasets against various defense strategies demonstrate the effectiveness of our proposed method in conducting effective unnoticeable backdoor attacks.

Bad Apples: Understanding the Centralized Security Risks in Decentralized Ecosystems

The blockchain-powered decentralized applications and systems have been widely deployed in recent years. The decentralization feature promises users anonymity, security, and non-censorship, which is especially welcomed in the areas of decentralized finance and digital assets. From the perspective of most common users, a decentralized ecosystem means every service follows the principle of decentralization. However, we find that the services in a decentralized ecosystem still may contain centralized components or scenarios, like third-party SDKs and privileged operations, which violate the promise of decentralization and may cause a series of centralized security risks. In this work, we systematically study the centralized security risks existing in decentralized ecosystems. Specifically, we identify seven centralized security risks in the deployment of two typical decentralized services – crypto wallets and DApps, such as anonymity loss and overpowered owner. Also, to measure these risks in the wild, we designed an automated detection tool called Naga and carried out large-scale experiments. Based on the measurement of 28 Ethereum crypto wallets (Android version) and 110,506 on-chain smart contracts, the result shows that the centralized security risks are widespread. Up to 96.4% of wallets and 83.5% of contracts exist at least one security risk, including 260 well-known tokens with a total market cap of over $98 billion.

Scan Me If You Can: Understanding and Detecting Unwanted Vulnerability Scanning

Web vulnerability scanners (WVS) are an indispensable tool for penetration testers and developers of web applications, allowing them to identify and fix low-hanging vulnerabilities before they are discovered by attackers. Unfortunately, malicious actors leverage the very same tools to identify and exploit vulnerabilities in third-party websites. Existing research in the WVS space is largely concerned with how many vulnerabilities these tools can discover, as opposed to trying to identify the tools themselves when they are used illicitly.

In this work, we design a testbed to characterize web vulnerability scanners using browser-based and network-based fingerprinting techniques. We conduct a measurement study over 12 web vulnerability scanners as well as 159 users who were recruited to interact with the same web applications that were targeted by the evaluated WVSs. By contrasting the traffic and behavior of these two groups, we discover tool-specific and type-specific behaviors in WVSs that are absent from regular users. Based on these observations,

we design and build ScannerScope, a machine-learning-based, web vulnerability scanner detection system. ScannerScope consists of a transparent reverse proxy that injects fingerprinting modules on the fly without the assistance (or knowledge) of the protected web applications. Our evaluation results show that ScannerScope can effectively detect WVSs and protect web applications against unwanted vulnerability scanning, with a detection accuracy of over 99% combined with near-zero false positives on human-visitor traffic. Finally, we show that the asynchronous design of ScannerScope results in a negligible impact on server performance and demonstrate that its classifier can resist adversarial ML attacks launched by sophisticated adversaries.

The More Things Change, the More They Stay the Same: Integrity of Modern JavaScript

The modern web is a collection of remote resources that are identified by their location and composed of interleaving networks of trust. Supply chain attacks compromise the users of a target domain by leveraging its often large set of trusted third parties who provide resources such as JavaScript. The ubiquity of JavaScript, paired with its ability to execute arbitrary code on client machines, makes this particular web resource an ideal vector for supply chain attacks. Currently, there exists no robust method for users browsing the web to verify that the script content they receive from a third party is the expected content.

In this paper, we present key insights to inform the design of robust integrity mechanisms, derived from our large-scale analyses of the 6M scripts we collected while crawling 44K domains every day for 77 days. We find that scripts that frequently change should be considered first-class citizens in the modern web ecosystem, and that the ways in which scripts change remain constant over time. Furthermore, we present analyses on the use of strict integrity verification (e.g., Subresource Integrity) at the granularity of the script providers themselves, offering a more complete perspective and demonstrating that the use of strict integrity alone cannot provide satisfactory security guarantees. We conclude that it is infeasible for a client to distinguish benign changes from malicious ones without additional, external knowledge, motivating the need for a new protocol to provide clients the necessary context to assess the potential ramifications of script changes.

Quantifying and Defending against Privacy Threats on Federated Knowledge Graph Embedding

Knowledge Graph Embedding (KGE) is a fundamental technique that extracts expressive representation from knowledge graph (KG) to facilitate diverse downstream tasks. The emerging federated KGE (FKGE) collaboratively trains from distributed KGs held among clients while avoiding exchanging clients’ sensitive raw KGs, which can still suffer from privacy threats as evidenced in other federated model trainings (e.g., neural networks). However, quantifying and defending against such privacy threats remain unexplored for FKGE which possesses unique properties not shared by previously studied models. In this paper, we conduct the first holistic study of the privacy threat on FKGE from both attack and defense perspectives. For the attack, we quantify the privacy threat by proposing three new inference attacks, which reveal substantial privacy risk by successfully inferring the existence of the KG triple from victim clients. For the defense, we propose DP-Flames, a novel differentially private FKGE with private selection, which offers a better privacy-utility tradeoff by exploiting the entity-binding sparse gradient property of FKGE and comes with a tight privacy accountant by incorporating the state-of-the-art private selection technique. We further propose an adaptive privacy budget allocation policy to dynamically adjust defense magnitude across the training procedure. Comprehensive evaluations demonstrate that the proposed defense can successfully mitigate the privacy threat by effectively reducing the success rate of inference attacks from to on average with only a modest utility decrease.

AppSniffer: Towards Robust Mobile App Fingerprinting Against VPN

Application fingerprinting is a useful data analysis technique for network administrators, marketing agencies, and security analysts. For example, an administrator can adopt application fingerprinting techniques to determine whether a user’s network access is allowed. Several mobile application fingerprinting techniques (e.g., FlowPrint, AppScanner, and ET-BERT) were recently introduced to identify applications using the characteristics of network traffic. However, we find that the performance of the existing mobile application fingerprinting systems significantly degrades when a virtual private network (VPN) is used. To address such a shortcoming, we propose a framework dubbed AppSniffer that uses a two-stage classification process for mobile app fingerprinting. In the first stage, we distinguish VPN traffic from normal traffic; in the second stage, we use the optimal model for each traffic type. Specifically, we propose a stacked ensemble model using Light Gradient Boosting Machine (LightGBM) and a FastAI library-based neural network model to identify applications’ traffic when a VPN is used. To show the feasibility of AppSniffer, we evaluate the detection accuracy of AppSniffer for 150 popularly used Android apps. Our experimental results show that AppSniffer effectively identifies mobile applications over VPNs with F1-scores between 84.66% and 95.49% across four different VPN protocols. In contrast, the best state-of-the-art method (i.e., AppScanner) demonstrates significantly lower F1-scores between 25.63% and 47.56% in the same settings. Overall, when normal traffic and VPN traffic are mixed, AppSniffer achieves an F1-score of 90.63%, which is significantly better than AppScanner that shows an F1-score of 70.36%.

RICC: Robust Collective Classification of Sybil Accounts

A Sybil attack is a critical threat that undermines the trust and integrity of web services by creating and exploiting a large number of fake (i.e., Sybil) accounts. To mitigate this threat, previous studies have proposed leveraging collective classification to detect Sybil accounts. Recently, researchers have demonstrated that state-of-the-art adversarial attacks are able to bypass existing collective classification methods, posing a new security threat. To this end, we propose RICC, the first robust collective classification framework, designed to identify adversarial Sybil accounts created by adversarial attacks. RICC leverages the novel observation that these adversarial attacks are highly tailored to a target collective classification model to optimize the attack budget. Owing to this adversarial strategy, the classification results for adversarial Sybil accounts often significantly change when deploying a new training set different from the original training set used for assigning prior reputation scores to user accounts. Leveraging this observation, RICC achieves robustness in collective classification by stabilizing classification results across different training sets randomly sampled in each round. RICC achieves false negative rates of 0.01, 0.11, 0.00, and 0.01 in detecting adversarial Sybil accounts for the Enron, Facebook, Twitter_S, and Twitter_L datasets, respectively. It also attains respective AUCs of 0.99, 1.00, 0.89, and 0.74 for these datasets, achieving high performance on the original task of detecting Sybil accounts. RICC significantly outperforms all existing Sybil detection methods, demonstrating superior robustness and efficacy in the collective classification of Sybil accounts.

IRWArt: Levering Watermarking Performance for Protecting High-quality Artwork Images

Increasing artwork plagiarism incidents underscores the urgent need for reliable copyright protection for high-quality artwork images. Although watermarking is helpful to this issue, existing methods are limited in imperceptibility and robustness. To provide high-level protection for valuable artwork images, we propose a novel invisible robust watermarking framework, dubbed as IRWArt. In our architecture, the embedding and recovery of the watermark are treated as a pair of image transformations’ inverse problems, and can be implemented through the forward and backward processes of an invertible neural networks (INN), respectively. For high visual quality, we embed the watermark in high-frequency domains with minimal impact on artwork and supervise image reconstruction using a human visual system(HVS)-consistent deep perceptual loss. For strong plagiarism-resistant, we construct a quality enhancement module for the embedded image against possible distortions caused by plagiarism actions. Moreover, the two-stagecontrastive training strategy enables the simultaneous realization of the above two goals. Experimental results on 4 datasets demonstrate the superiority of our IRWArt over other state-of-the-art watermarking methods. Code: https://github.com/1024yy/IRWArt.

Sanitizing Sentence Embeddings (and Labels) for Local Differential Privacy

Differentially private (DP) learning, notably DP stochastic gradient descent (DP-SGD), has limited applicability in fine-tuning gigantic pre-trained language models (LMs) for natural language processing tasks. The culprit is the perturbation of gradients (as gigantic as entire models), leading to significant efficiency and accuracy drops.

We show how to achieve metric-based local DP (LDP) by sanitizing (high-dimensional) sentence embedding, extracted by LMs and much smaller than gradients. For potential utility improvement, we impose a consistency constraint on the sanitization. We explore two approaches: One is brand new and can directly output consistent noisy embeddings; the other is an upgradation with post-processing. To further mitigate “the curse of dimensionality,” we introduce two trainable linear maps for mediating dimensions without hurting privacy or utility. Our protection can effectively defend against privacy threats on embeddings. It also naturally extends to inference.

Our experiments1 show that we reach the non-private accuracy under properly configured parameters, e.g., 0.92 for SST-2 with a privacy budget ϵ = 10 and the reduced dimension as 16. We also sanitize the label for LDP (with another small privacy budget) with limited accuracy losses to fully protect every sequence-label pair.

ZTLS: A DNS-based Approach to Zero Round Trip Delay in TLS handshake

Establishing secure connections fast to end-users is crucial to online services. However, when a client sets up a TLS session with a server, the TLS handshake needs one round trip time (RTT) to negotiate a session key. Additionally, establishing a TLS session also requires a DNS lookup (e.g., the A record lookup to fetch the IP address of the server) and a TCP handshake. In this paper, we propose ZTLS to eliminate the 1-RTT latency for the TLS handshake by leveraging the DNS. In ZTLS, a server distributes TLS handshake-related data (i.e., Diffie-Hellman elements), dubbed Z-data, as DNS records. A ZTLS client can fetch Z-data by DNS lookups and derive a session key. With the session key, the client can send encrypted data along with its ClientHello, achieving 0-RTT. ZTLS supports incremental deployability on the current TLS-based infrastructure. Our prototype-based experiments show that ZTLS is 1-RTT faster than TLS in terms of the first response time.

AgrEvader: Poisoning Membership Inference against Byzantine-robust Federated Learning

The Poisoning Membership Inference Attack (PMIA) is a newly emerging privacy attack that poses a significant threat to federated learning (FL). An adversary conducts data poisoning (i.e., performing adversarial manipulations on training examples) to extract membership information by exploiting the changes in loss resulting from data poisoning. The PMIA significantly exacerbates the traditional poisoning attack that is primarily focused on model corruption. However, there has been a lack of a comprehensive systematic study that thoroughly investigates this topic. In this work, we conduct a benchmark evaluation to assess the performance of PMIA against the Byzantine-robust FL setting that is specifically designed to mitigate poisoning attacks. We find that all existing coordinate-wise averaging mechanisms fail to defend against the PMIA, while the detect-then-drop strategy was proven to be effective in most cases, implying that the poison injection is memorized and the poisonous effect rarely dissipates. Inspired by this observation, we propose AgrEvader, a PMIA that maximizes the adversarial impact on the victim samples while circumventing the detection by Byzantine-robust mechanisms. AgrEvader significantly outperforms existing PMIAs. For instance, AgrEvader achieved a high attack accuracy of between 72.78% (on CIFAR-10) to 97.80% (on Texas100), which is an average accuracy increase of 13.89% compared to the strongest PMIA reported in the literature. We evaluated AgrEvader on five datasets across different domains, against a comprehensive list of threat models, which included black-box, gray-box and white-box models for targeted and non-targeted scenarios. AgrEvader demonstrated consistent high accuracy across all settings tested. The code is available at: https://github.com/PrivSecML/AgrEvader.

SESSION: Semantics and Knowledge

Event Prediction using Case-Based Reasoning over Knowledge Graphs

Applying link prediction (LP) methods over knowledge graphs (KG) for tasks such as causal event prediction presents an exciting opportunity. However, typical LP models are ill-suited for this task as they are incapable of performing inductive link prediction for new, unseen event entities and they require retraining as knowledge is added or changed in the underlying KG. We introduce a case-based reasoning model, EvCBR, to predict properties about new consequent events based on similar cause-effect events present in the KG. EvCBR uses statistical measures to identify similar events and performs path-based predictions, requiring no training step. To generalize our methods beyond the domain of event prediction, we frame our task as a 2-hop LP task, where the first hop is a causal relation connecting a cause event to a new effect event and the second hop is a property about the new event which we wish to predict. The effectiveness of our method is demonstrated using a novel dataset of newsworthy events with causal relations curated from Wikidata, where EvCBR outperforms baselines including translational-distance-based, GNN-based, and rule-based LP models.

CapEnrich: Enriching Caption Semantics for Web Images via Cross-modal Pre-trained Knowledge

Automatically generating textual descriptions for massive unlabeled images on the web can greatly benefit realistic web applications, e.g. multimodal retrieval and recommendation. However, existing models suffer from the problem of generating “over-generic” descriptions, such as their tendency to generate repetitive sentences with common concepts for different images. These generic descriptions fail to provide sufficient textual semantics for ever-changing web images. Inspired by the recent success of Vision-Language Pre-training (VLP) models that learn diverse image-text concept alignment during pretraining, we explore leveraging their cross-modal pre-trained knowledge to automatically enrich the textual semantics of image descriptions. With no need for additional human annotations, we propose a plug-and-play framework, i.e CapEnrich, to complement the generic image descriptions with more semantic details. Specifically, we first propose an automatic data-building strategy to get desired training sentences, based on which we then adopt prompting strategies, i.e. learnable and template prompts, to incentivize VLP models to generate more textual details. For learnable templates, we fix the whole VLP model and only tune the prompt vectors, which leads to two advantages: 1) the pre-training knowledge of VLP models can be reserved as much as possible to describe diverse visual concepts; 2) only lightweight trainable parameters are required, so it is friendly to low data resources. Extensive experiments show that our method significantly improves the descriptiveness and diversity of generated sentences for web images. The code is available at https://github.com/yaolinli/CapEnrich.

Wikidata as a seed for Web Extraction

Wikidata has grown to a knowledge graph with an impressive size. To date, it contains more than 17 billion triples collecting information about people, places, films, stars, publications, proteins, and many more. On the other side, most of the information on the Web is not published in highly structured data repositories like Wikidata, but rather as unstructured and semi-structured content, more concretely in HTML pages containing text and tables. Finding, monitoring, and organizing this data in a knowledge graph is requiring considerable work from human editors. The volume and complexity of the data make this task difficult and time-consuming. In this work, we present a framework that is able to identify and extract new facts that are published under multiple Web domains so that they can be proposed for validation by Wikidata editors. The framework is relying on question-answering technologies. We take inspiration from ideas that are used to extract facts from textual collections and adapt them to extract facts from Web pages. For achieving this, we demonstrate that language models can be adapted to extract facts not only from textual collections but also from Web pages. By exploiting the information already contained in Wikidata the proposed framework can be trained without the need for any additional learning signals and can extract new facts for a wide range of properties and domains. Following this path, Wikidata can be used as a seed to extract facts on the Web. Our experiments show that we can achieve a mean performance of 84.07 at F1-score. Moreover, our estimations show that we can potentially extract millions of facts that can be proposed for human validation. The goal is to help editors in their daily tasks and contribute to the completion of the Wikidata knowledge graph.

Learning Long- and Short-term Representations for Temporal Knowledge Graph Reasoning

Temporal Knowledge graph (TKG) reasoning aims to predict missing facts based on historical TKG data. Most of the existing methods are incapable of explicitly modeling the long-term time dependencies from history and neglect the adaptive integration of the long- and short-term information. To tackle these problems, we propose a novel method that utilizes a designed Hierarchical Relational Graph Neural Network to learn the Long- and Short-term representations for TKG reasoning, namely HGLS. Specifically, to explicitly associate entities in different timestamps, we first transform the TKG into a global graph. Based on the built graph, we design a Hierarchical Relational Graph Neural Network that executes in two levels: The sub-graph level is to capture the semantic dependencies within concurrent facts of each KG. And the global-graph level aims to model the temporal dependencies between entities. Furthermore, we design a module to extract the long- and short-term information from the output of these two levels. Finally, the long- and short-term representations are fused into a unified one by Gating Integration for entity prediction. Extensive experiments on four datasets demonstrate the effectiveness of HGLS.

MLN4KB: an efficient Markov logic network engine for large-scale knowledge bases and structured logic rules

Markov logic network (MLN) is a powerful statistical modeling framework for probabilistic logic reasoning. Despite the elegancy and effectiveness of MLN, the inference of MLN is known to suffer from an efficiency issue. Even the state-of-the-art MLN engines can not scale to medium-size real-world knowledge bases in the open-world setting, i.e., all unobserved facts in the knowledge base need predictions. In this work, by focusing on a certain class of first-order logic rules that are sufficiently expressive, we develop a highly efficient MLN inference engine called MLN4KB that can leverage the sparsity of knowledge bases. MLN4KB enjoys quite strong theoretical properties; its space and time complexities can be exponentially smaller than existing MLN engines. Experiments on both synthetic and real-world knowledge bases demonstrate the effectiveness of the proposed method. MLN4KB is orders of magnitudes faster (more than 103 times faster on some datasets) than existing MLN engines in the open-world setting. Without any approximation tricks, MLN4KB can scale to real-world knowledge bases including WN-18 and YAGO3-10 and achieve decent prediction accuracy without bells and whistles.

We implement MLN4KB as a Julia package called MLN4KB.jl. The package supports both maximum a posteriori (MAP) inference and learning the weights of rules. MLN4KB.jl is public available at https://github.com/baidu-research/MLN4KB .

Meta-Learning Based Knowledge Extrapolation for Temporal Knowledge Graph

In the last few years, the solution to Knowledge Graph (KG) completion via learning embeddings of entities and relations has attracted a surge of interest. Temporal KGs(TKGs) extend traditional Knowledge Graphs (KGs) by associating static triples with timestamps forming quadruples. Different from KGs and TKGs in the transductive setting, constantly emerging entities and relations in incomplete TKGs create demand to predict missing facts with unseen components, which is the extrapolation setting. Traditional temporal knowledge graph embedding (TKGE) methods are limited in the extrapolation setting since they are trained within a fixed set of components. In this paper, we propose a Meta-Learning based Temporal Knowledge Graph Extrapolation (MTKGE) model, which is trained on link prediction tasks sampled from the existing TKGs and tested in the emerging TKGs with unseen entities and relations. Specifically, we meta-train a GNN framework that captures relative position patterns and temporal sequence patterns between relations. The learned embeddings of patterns can be transferred to embed unseen components. Experimental results on two different TKG extrapolation datasets show that MTKGE consistently outperforms both the existing state-of-the-art models for knowledge graph extrapolation and specifically adapted KGE and TKGE baselines.

Heterogeneous Federated Knowledge Graph Embedding Learning and Unlearning

Federated Learning (FL) recently emerges as a paradigm to train a global machine learning model across distributed clients without sharing raw data. Knowledge Graph (KG) embedding represents KGs in a continuous vector space, serving as the backbone of many knowledge-driven applications. As a promising combination, federated KG embedding can fully take advantage of knowledge learned from different clients while preserving the privacy of local data. However, realistic problems such as data heterogeneity and knowledge forgetting still remain to be concerned. In this paper, we propose FedLU, a novel FL framework for heterogeneous KG embedding learning and unlearning. To cope with the drift between local optimization and global convergence caused by data heterogeneity, we propose mutual knowledge distillation to transfer local knowledge to global, and absorb global knowledge back. Moreover, we present an unlearning method based on cognitive neuroscience, which combines retroactive interference and passive decay to erase specific knowledge from local clients and propagate to the global model by reusing knowledge distillation. We construct new datasets for assessing realistic performance of the state-of-the-arts. Extensive experiments show that FedLU achieves superior results in both link prediction and knowledge forgetting.

Can Persistent Homology provide an efficient alternative for Evaluation of Knowledge Graph Completion Methods?

In this paper we present a novel method, Knowledge Persistence (), for faster evaluation of Knowledge Graph (KG) completion approaches. Current ranking-based evaluation is quadratic in the size of the KG, leading to long evaluation times and consequently a high carbon footprint. addresses this by representing the topology of the KG completion methods through the lens of topological data analysis, concretely using persistent homology. The characteristics of persistent homology allow to evaluate the quality of the KG completion looking only at a fraction of the data. Experimental results on standard datasets show that the proposed metric is highly correlated with ranking metrics (Hits@N, MR, MRR). Performance evaluation shows that is computationally efficient: In some cases, the evaluation time (validation+test) of a KG completion method has been reduced from 18 hours (using Hits@10) to 27 seconds (using ), and on average (across methods & data) reduces the evaluation time (validation+test) by ≈ 99.96%.

A Single Vector Is Not Enough: Taxonomy Expansion via Box Embeddings

Taxonomies, which organize knowledge hierarchically, support various practical web applications such as product navigation in online shopping and user profile tagging on social platforms. Given the continued and rapid emergence of new entities, maintaining a comprehensive taxonomy in a timely manner through human annotation is prohibitively expensive. Therefore, expanding a taxonomy automatically with new entities is essential. Most existing methods for expanding taxonomies encode entities into vector embeddings (i.e., single points). However, we argue that vectors are insufficient to model the “is-a” hierarchy in taxonomy (asymmetrical relation), because two points can only represent pairwise similarity (symmetrical relation). To this end, we propose to project taxonomy entities into boxes (i.e., hyperrectangles). Two boxes can be "contained", "disjoint" and "intersecting", thus naturally representing an asymmetrical taxonomic hierarchy. Upon box embeddings, we propose a novel model BoxTaxo for taxonomy expansion. The core of BoxTaxo is to learn boxes for entities to capture their child-parent hierarchies. To achieve this, BoxTaxo optimizes the box embeddings from a joint view of geometry and probability. BoxTaxo also offers an easy and natural way for inference: examine whether the box of a given new entity is fully enclosed inside the box of a candidate parent from the existing taxonomy. Extensive experiments on two benchmarks demonstrate the effectiveness of BoxTaxo compared to vector based models.

Knowledge Graph Question Answering with Ambiguous Query

Knowledge graph question answering aims to identify answers of the query according to the facts in the knowledge graph. In the vast majority of the existing works, the input queries are considered perfect and can precisely express the user’s query intention. However, in reality, input queries might be ambiguous and elusive which only contain a limited amount of information. Directly answering these ambiguous queries may yield unwanted answers and deteriorate user experience. In this paper, we propose PReFNet which focuses on answering ambiguous queries with pseudo relevance feedback on knowledge graphs. In order to leverage the hidden (pseudo) relevance information existed in the results that are initially returned from a given query, PReFNet treats the top-k returned candidate answers as a set of most relevant answers, and uses variational Bayesian inference to infer user’s query intention. To boost the quality of the inferred queries, a neighborhood embedding based VGAE model is used to prune inferior inferred queries. The inferred high quality queries will be returned to the users to help them search with ease. Moreover, all the high-quality candidate nodes will be re-ranked according to the inferred queries. The experiment results show that our proposed method can recommend high-quality query graphs to users and improve the question answering accuracy.

Atrapos: Real-time Evaluation of Metapath Query Workloads

Heterogeneous information networks (HINs) represent different types of entities and relationships between them. Exploring and mining HINs relies on metapath queries that identify pairs of entities connected by relationships of diverse semantics. While the real-time evaluation of metapath query workloads on large, web-scale HINs is highly demanding in computational cost, current approaches do not exploit interrelationships among the queries. In this paper, we present Atrapos, a new approach for the real-time evaluation of metapath query workloads that leverages a combination of efficient sparse matrix multiplication and intermediate result caching. Atrapos selects intermediate results to cache and reuse by detecting frequent sub-metapaths among workload queries in real time, using a tailor-made data structure, the Overlap Tree, and an associated caching policy. Our experimental study on real data shows that Atrapos  accelerates exploratory data analysis and mining on HINs, outperforming off-the-shelf caching approaches and state-of-the-art research prototypes in all examined scenarios.

Attribute-Consistent Knowledge Graph Representation Learning for Multi-Modal Entity Alignment

The multi-modal entity alignment (MMEA) aims to find all equivalent entity pairs between multi-modal knowledge graphs (MMKGs). Rich attributes and neighboring entities are valuable for the alignment task, but existing works ignore contextual gap problems that the aligned entities have different numbers of attributes on specific modality when learning entity representations. In this paper, we propose a novel attribute-consistent knowledge graph representation learning framework for MMEA (ACK-MMEA) to compensate the contextual gaps through incorporating consistent alignment knowledge. Attribute-consistent KGs (ACKGs) are first constructed via multi-modal attribute uniformization with merge and generate operators so that each entity has one and only one uniform feature in each modality. The ACKGs are then fed into a relation-aware graph neural network with random dropouts, to obtain aggregated relation representations and robust entity representations. In order to evaluate the ACK-MMEA facilitated for entity alignment, we specially design a joint alignment loss for both entity and attribute evaluation. Extensive experiments conducted on two benchmark datasets show that our approach achieves excellent performance compared to its competitors.

TaxoComplete: Self-Supervised Taxonomy Completion Leveraging Position-Enhanced Semantic Matching

Taxonomies are used to organize knowledge in many applications, including recommender systems, content browsing, or web search. With the emergence of new concepts, static taxonomies become obsolete as they fail to capture up-to-date knowledge. Several approaches have been proposed to address the problem of maintaining taxonomies automatically. These approaches typically rely on a limited set of neighbors to represent a given node in the taxonomy. However, considering distant nodes could improve the representation of some portions of the taxonomy, especially for those nodes situated in the periphery or in sparse regions of the taxonomy.

In this work, we propose TaxoComplete, a self-supervised taxonomy completion framework that learns the representation of nodes leveraging their position in the taxonomy. TaxoComplete uses a self-supervision generation process that selects some nodes and associates each of them with an anchor set, which is a set composed of nodes in the close and distant neighborhood of the selected node. Using self-supervision data, TaxoComplete learns a position-enhanced node representation using two components: (1) a query-anchor semantic matching mechanism, which encodes pairs of nodes and matches their semantic distance to their graph distance, such that nodes that are close in the taxonomy are placed closely in the shared embedding space while distant nodes are placed further apart; (2) a direction-aware propagation module, which embeds the direction of edges in node representation, such that we discriminate <node, parent> relation from other taxonomic relations. Our approach allows the representation of nodes to encapsulate information from a large neighborhood while being aware of the distance separating pairs of nodes in the taxonomy. Extensive experiments on four real-world and large-scale datasets show that TaxoComplete is substantially more effective than state-of-the-art methods (2x more effective in terms of HIT@k).

Hierarchy-Aware Multi-Hop Question Answering over Knowledge Graphs

Knowledge graphs (KGs) have been widely used to enhance complex question answering (QA). To understand complex questions, existing studies employ language models (LMs) to encode contexts. Despite the simplicity, they neglect the latent relational information among question concepts and answers in KGs. While question concepts ubiquitously present hyponymy at the semantic level, e.g., mammals and animals, this feature is identically reflected in the hierarchical relations in KGs, e.g., a_type_of. Therefore, we are motivated to explore comprehensive reasoning by the hierarchical structures in KGs to help understand questions. However, it is non-trivial to reason over tree-like structures compared with chained paths. Moreover, identifying appropriate hierarchies relies on expertise. To this end, we propose HamQA, a novel Hierarchy-aware multi-hop Question Answering framework on knowledge graphs, to effectively align the mutual hierarchical information between question contexts and KGs. The entire learning is conducted in Hyperbolic space, inspired by its advantages of embedding hierarchical structures. Specifically, (i) we design a context-aware graph attentive network to capture context information. (ii) Hierarchical structures are continuously preserved in KGs by minimizing the Hyperbolic geodesic distances. The comprehensive reasoning is conducted to jointly train both components and provide a top-ranked candidate as an optimal answer. We achieve a higher ranking than the state-of-the-art multi-hop baselines on the official OpenBookQA leaderboard with an accuracy of 85%.

Unsupervised Entity Alignment for Temporal Knowledge Graphs

Entity alignment (EA) is a fundamental data integration task that identifies equivalent entities between different knowledge graphs (KGs). Temporal Knowledge graphs (TKGs) extend traditional knowledge graphs by introducing timestamps, which have received increasing attention. State-of-the-art time-aware EA studies have suggested that the temporal information of TKGs facilitates the performance of EA. However, existing studies have not thoroughly exploited the advantages of temporal information in TKGs. Also, they perform EA by pre-aligning entity pairs, which can be labor-intensive and thus inefficient. In this paper, we present DualMatch that effectively fuses the relational and temporal information for EA. DualMatch transfers EA on TKGs into a weighted graph matching problem. More specifically, DualMatch is equipped with an unsupervised method, which achieves EA without necessitating the seed alignment. DualMatch has two steps: (i) encoding temporal and relational information into embeddings separately using a novel label-free encoder, Dual-Encoder; and (ii) fusing both information and transforming it into alignment using a novel graph-matching-based decoder, GM-Decoder. DualMatch is able to perform EA on TKGs with or without supervision, due to its capability of effectively capturing temporal information. Extensive experiments on three real-world TKG datasets offer the insight that DualMatch significantly outperforms the state-of-the-art methods.

Hierarchical Self-Attention Embedding for Temporal Knowledge Graph Completion

Temporal Knowledge Graph (TKG) is composed of a series of facts related to timestamps in the real world and has become the basis of many artificial intelligence applications. However, the existing TKG is usually incomplete. It has become a hot research task to infer missing facts based on existing facts in a TKG; namely, Temporal Knowledge Graph Completion (TKGC). The current mainstream TKGC models are embedded models that predict missing facts by representing entities, relations and timestamps as low-dimensional vectors. In order to deal with the TKG structure information, there are some models that try to introduce attention mechanism into the embedding process. But they only consider the structure information of entities or relations, and ignore the structure information of the whole TKG. Moreover, most of them usually treat timestamps as a general feature and cannot take advantage of the potential time series information of the timestamp. To solve these problems, wo propose a new Hierarchical Self-Attention Embedding (HSAE) model which inspired by self-attention mechanism and diachronic embedding technique. For structure information of the whole TKG, we divide the TKG into two layers: entity layer and relation layer, and then apply the self-attention mechanism to the entity layer and relation layer respectively to capture the structure information. For time series information of the timestamp, we capture them by combining positional encoding and diachronic embedding technique into the above two self-attention layers. Finally, we can get the embedded representation vectors of entities, relations and timestamps, which can be combined with other models for better results. We evaluate our model on three TKG datasets: ICEWS14, ICEWS05-15 and GDELT. Experimental results on the TKGC (interpolation) task demonstrate that our model achieves state-of-the-art results.

KRACL: Contrastive Learning with Graph Context Modeling for Sparse Knowledge Graph Completion

Knowledge Graph Embeddings (KGE) aim to map entities and relations to low dimensional spaces and have become the de-facto standard for knowledge graph completion. Most existing KGE methods suffer from the sparsity challenge, where it is harder to predict entities that appear less frequently in knowledge graphs. In this work, we propose a novel framework KRACL1 to alleviate the widespread sparsity in KGs with graph context and contrastive learning. Firstly, we propose the Knowledge Relational Attention Network (KRAT) to leverage the graph context by simultaneously projecting neighboring triples to different latent spaces and jointly aggregating messages with the attention mechanism. KRAT is capable of capturing the subtle semantic information and importance of different context triples as well as leveraging multi-hop information in knowledge graphs. Secondly, we propose the knowledge contrastive loss by combining the contrastive loss with cross entropy loss, which introduces more negative samples and thus enriches the feedback to sparse entities. Our experiments demonstrate that KRACL achieves superior results across various standard knowledge graph benchmarks, especially on WN18RR and NELL-995 which have large numbers of low in-degree entities. Extensive experiments also bear out KRACL’s effectiveness in handling sparse knowledge graphs and robustness against noisy triples.

TRAVERS: A Diversity-Based Dynamic Approach to Iterative Relevance Search over Knowledge Graphs

Relevance search over knowledge graphs seeks top-ranked answer entities that are most relevant to a query entity. Since the semantics of relevance varies with the user need and its formalization is difficult for non-experts, existing methods infer semantics from user-provided example answer entities. However, a user may provide very few examples, even none at the beginning of interaction, thereby limiting the effectiveness of such methods. In this paper, we vision a more practical scenario called labeling-based iterative relevance search: instead of effortfully inputting example answer entities, the user effortlessly (e.g., implicitly) labels current answer entities, and is rewarded with improved answer entities in the next iteration. To realize the scenario, our approach TRAVERS incorporates two rankers: a diversity-oriented ranker for supporting cold start and avoiding converging to sub-optimum caused by noisy labels, and a relevance-oriented ranker capable of handling unbalanced labels. Moreover, the two rankers and their combination dynamically evolve over iterations. TRAVERS outperformed a variety of baselines in experiments with simulated and real user behavior.

IMF: Interactive Multimodal Fusion Model for Link Prediction

Link prediction aims to identify potential missing triples in knowledge graphs. To get better results, some recent studies have introduced multimodal information to link prediction. However, these methods utilize multimodal information separately and neglect the complicated interaction between different modalities. In this paper, we aim at better modeling the inter-modality information and thus introduce a novel Interactive Multimodal Fusion (IMF) model to integrate knowledge from different modalities. To this end, we propose a two-stage multimodal fusion framework to preserve modality-specific knowledge as well as take advantage of the complementarity between different modalities. Instead of directly projecting different modalities into a unified space, our multimodal fusion module limits the representations of different modalities independent while leverages bilinear pooling for fusion and incorporates contrastive learning as additional constraints. Furthermore, the decision fusion module delivers the learned weighted average over the predictions of all modalities to better incorporate the complementarity of different modalities. Our approach has been demonstrated to be effective through empirical evaluations on several real-world datasets. The implementation code is available online at https://github.com/HestiaSky/IMF-Pytorch.

Structure Pretraining and Prompt Tuning for Knowledge Graph Transfer

Knowledge graphs (KG) are essential background knowledge providers in many tasks. When designing models for KG-related tasks, one of the key tasks is to devise the Knowledge Representation and Fusion (KRF) module that learns the representation of elements from KGs and fuses them with task representations. While due to the difference of KGs and perspectives to be considered during fusion across tasks, duplicate and ad hoc KRF modules design are conducted among tasks. In this paper, we propose a novel knowledge graph pretraining model KGTransformer that could serve as a uniform KRF module in diverse KG-related tasks. We pretrain KGTransformer with three self-supervised tasks with sampled sub-graphs as input. For utilization, we propose a general prompt-tuning mechanism regarding task data as a triple prompt to allow flexible interactions between task KGs and task data. We evaluate pretrained KGTransformer on three tasks, triple classification, zero-shot image classification, and question answering. KGTransformer consistently achieves better results than specifically designed task models. Through experiments, we justify that the pretrained KGTransformer could be used off the shelf as a general and effective KRF module across KG-related tasks. The code and datasets are available at https://github.com/zjukg/KGTransformer.

TEA: Time-aware Entity Alignment in Knowledge Graphs

Entity alignment (EA) aims to identify equivalent entities between knowledge graphs (KGs), which is a key technique to improve the coverage of existing KGs. Current EA models largely ignore the importance of time information contained in KGs and treat relational facts or attribute values of entities as time-invariant. However, real-world entities could evolve over time, making the knowledge of the aligned entities very different in multiple KGs. This may cause incorrect matching between KGs if such entity dynamics is ignored. In this paper, we propose a time-aware entity alignment (TEA) model that discovers the entity evolving behaviour by exploring the time contexts in KGs and aggregates various contextual information to make the alignment decision. In particular, we address two main challenges in the TEA model: 1) How to identify highly-correlated temporal facts; 2) How to capture entity dynamics and incorporate it to learn a more informative entity representation for the alignment task. Experiments on real-world datasets1 verify the superiority of our TEA model over state-of-the-art entity aligners.

Link Prediction with Attention Applied on Multiple Knowledge Graph Embedding Models

Predicting missing links between entities in a knowledge graph is a fundamental task to deal with the incompleteness of data on the Web. Knowledge graph embeddings map nodes into a vector space to predict new links, scoring them according to geometric criteria. Relations in the graph may follow patterns that can be learned, e.g., some relations might be symmetric and others might be hierarchical. However, the learning capability of different embedding models varies for each pattern and, so far, no single model can learn all patterns equally well. In this paper, we combine the query representations from several models in a unified one to incorporate patterns that are independently captured by each model. Our combination uses attention to select the most suitable model to answer each query. The models are also mapped onto a non-Euclidean manifold, the Poincaré ball, to capture structural patterns, such as hierarchies, besides relational patterns, such as symmetry. We prove that our combination provides a higher expressiveness and inference power than each model on its own. As a result, the combined model can learn relational and structural patterns. We conduct extensive experimental analysis with various link prediction benchmarks showing that the combined model outperforms individual models, including state-of-the-art approaches.

Knowledge Graph Completion with Counterfactual Augmentation

Graph Neural Networks (GNNs) have demonstrated great success in Knowledge Graph Completion (KGC) by modeling how entities and relations interact in recent years. However, most of them are designed to learn from the observed graph structure, which appears to have imbalanced relation distribution during the training stage. Motivated by the causal relationship among the entities on a knowledge graph, we explore this defect through a counterfactual question: “would the relation still exist if the neighborhood of entities became different from observation?”. With a carefully designed instantiation of a causal model on the knowledge graph, we generate the counterfactual relations to answer the question by regarding the representations of entity pair given relation as context, structural information of relation-aware neighborhood as treatment, and validity of the composed triplet as the outcome. Furthermore, we incorporate the created counterfactual relations with the GNN-based framework on KGs to augment their learning of entity pair representations from both the observed and counterfactual relations. Experiments on benchmarks show that our proposed method outperforms existing methods on the task of KGC, achieving new state-of-the-art results. Moreover, we demonstrate that the proposed counterfactual relations-based augmentation also enhances the interpretability of the GNN-based framework through the path interpretations of predictions.

Mutually-paced Knowledge Distillation for Cross-lingual Temporal Knowledge Graph Reasoning

This paper investigates cross-lingual temporal knowledge graph reasoning problem, which aims to facilitate reasoning on Temporal Knowledge Graphs (TKGs) in low-resource languages by transfering knowledge from TKGs in high-resource ones. The cross-lingual distillation ability across TKGs becomes increasingly crucial, in light of the unsatisfying performance of existing reasoning methods on those severely incomplete TKGs, especially in low-resource languages. However, it poses tremendous challenges in two aspects. First, the cross-lingual alignments, which serve as bridges for knowledge transfer, are usually too scarce to transfer sufficient knowledge between two TKGs. Second, temporal knowledge discrepancy of the aligned entities, especially when alignments are unreliable, can mislead the knowledge distillation process. We correspondingly propose a mutually-paced knowledge distillation model MP-KD, where a teacher network trained on a source TKG can guide the training of a student network on target TKGs with an alignment module. Concretely, to deal with the scarcity issue, MP-KD generates pseudo alignments between TKGs based on the temporal information extracted by our representation module. To maximize the efficacy of knowledge transfer and control the noise caused by the temporal knowledge discrepancy, we enhance MP-KD with a temporal cross-lingual attention mechanism to dynamically estimate the alignment strength. The two procedures are mutually paced along with model training. Extensive experiments on twelve cross-lingual TKG transfer tasks in the EventKG benchmark demonstrate the effectiveness of the proposed MP-KD method.

Message Function Search for Knowledge Graph Embedding

Recently, many promising embedding models have been proposed to embed knowledge graphs (KGs) and their more general forms, such as n-ary relational data (NRD) and hyper-relational KG (HKG). To promote the data adaptability and performance of embedding models, KG searching methods propose to search for suitable models for a given KG data set. But they are restricted to a single KG form, and the searched models are restricted to a single type of embedding model. To tackle such issues, we propose to build a search space for the message function in graph neural networks (GNNs). However, it is a non-trivial task. Existing message function designs fix the structures and operators, which makes them difficult to handle different KG forms and data sets. Therefore, we first design a novel message function space, which enables both structures and operators to be searched for the given KG form (including KG, NRD, and HKG) and data. The proposed space can flexibly take different KG forms as inputs and is expressive to search for different types of embedding models. Especially, some existing message function designs and some classic KG embedding models can be instantiated as special cases of our space. We empirically show that the searched message functions are data-dependent, and can achieve leading performance on benchmark KGs, NRD, and HKGs.

SESSION: Web & Society

Cashing in on Contacts: Characterizing the OnlyFans Ecosystem

Adult video-sharing has undergone dramatic shifts. New platforms that directly interconnect (often amateur) producers and consumers now allow content creators to promote material across the web and directly monetize the content they produce. OnlyFans is the most prominent example of this new trend. OnlyFans is a content subscription service where creators earn money from users who subscribe to their material. In contrast to prior adult platforms, OnlyFans emphasizes creator-consumer interaction for audience accumulation and maintenance. This results in a wide cross-platform ecosystem geared towards bringing consumers to creators’ accounts. In this paper, we inspect this emerging ecosystem, focusing on content creators and the third-party platforms they connect to.

Learning Social Meta-knowledge for Nowcasting Human Mobility in Disaster

Human mobility nowcasting is a fundamental research problem for intelligent transportation planning, disaster responses and management, etc. In particular, human mobility under big disasters such as hurricanes and pandemics deviates from its daily routine to a large extent, which makes the task more challenging. Existing works mainly focus on traffic or crowd flow prediction in normal situations. To tackle this problem, in this study, disaster-related Twitter data is incorporated as a covariate to understand the public awareness and attention about the disaster events and thus perceive their impacts on the human mobility. Accordingly, we propose a Meta-knowledge-Memorizable Spatio-Temporal Network (MemeSTN), which leverages memory network and meta-learning to fuse social media and human mobility data. Extensive experiments over three real-world disasters including Japan 2019 typhoon season, Japan 2020 COVID-19 pandemic, and US 2019 hurricane season were conducted to illustrate the effectiveness of our proposed solution. Compared to the state-of-the-art spatio-temporal deep models and multivariate-time-series deep models, our model can achieve superior performance for nowcasting human mobility in disaster situations at both country level and state level.

Automated Content Moderation Increases Adherence to Community Guidelines

Online social media platforms use automated moderation systems to remove or reduce the visibility of rule-breaking content. While previous work has documented the importance of manual content moderation, the effects of automated content moderation remain largely unknown. Here, in a large study of Facebook comments (n = 412M), we used a fuzzy regression discontinuity design to measure the impact of automated content moderation on subsequent rule-breaking behavior (number of comments hidden/deleted) and engagement (number of additional comments posted). We found that comment deletion decreased subsequent rule-breaking behavior in shorter threads (20 or fewer comments), even among other participants, suggesting that the intervention prevented conversations from derailing. Further, the effect of deletion on the affected user’s subsequent rule-breaking behavior was longer-lived than its effect on reducing commenting in general, suggesting that users were deterred from rule-breaking but not from commenting. In contrast, hiding (rather than deleting) content had small and statistically insignificant effects. Our results suggest that automated content moderation increases adherence to community guidelines.

Mental Health Coping Stories on Social Media: A Causal-Inference Study of Papageno Effect

The Papageno effect concerns how media can play a positive role in preventing and mitigating suicidal ideation and behaviors. With the increasing ubiquity and widespread use of social media, individuals often express and share lived experiences and struggles with mental health. However, there is a gap in our understanding about the existence and effectiveness of the Papageno effect in social media, which we study in this paper. In particular, we adopt a causal-inference framework to examine the impact of exposure to mental health coping stories on individuals on Twitter. We obtain a Twitter dataset with ∼ 2M posts by ∼ 10K individuals. We consider engaging with coping stories as the Treatment intervention, and adopt a stratified propensity score approach to find matched cohorts of Treatment and Control individuals. We measure the psychosocial shifts in affective, behavioral, and cognitive outcomes in longitudinal Twitter data before and after engaging with the coping stories. Our findings reveal that, engaging with coping stories leads to decreased stress and depression, and improved expressive writing, diversity, and interactivity. Our work discusses the practical and platform design implications in supporting mental wellbeing.

Misbehavior and Account Suspension in an Online Financial Communication Platform

The expanding accessibility and appeal of investing have attracted millions of new retail investors. As such, investment discussion boards became the de facto communities where traders create, disseminate, and discuss investing ideas. These communities, which can provide useful information to support investors, have anecdotally also attracted a wide range of misbehavior – toxicity, spam/fraud, and reputation manipulation. This paper is the first comprehensive analysis of online misbehavior in the context of investment communities. We study TradingView, the largest online communication platform for financial trading. We collect 2.76M user profiles with their corresponding social graphs, 4.2M historical article posts, and 5.3M comments, including information on nearly 4 000 suspended accounts and 17 000 removed comments. Price fluctuations seem to drive abuse across the platform and certain types of assets, such as “meme” stocks, attract disproportionate misbehavior. Suspended user accounts tend to form more closely-knit communities than those formed by non-suspended accounts; and paying accounts are less likely to be suspended than free accounts even when posting similar levels of content violating platform policies. We conclude by offering guidelines on how to adapt content moderation efforts to fit the particularities of online investment communities.

Reinforcement Learning-based Counter-Misinformation Response Generation: A Case Study of COVID-19 Vaccine Misinformation

The spread of online misinformation threatens public health, democracy, and the broader society. While professional fact-checkers form the first line of defense by fact-checking popular false claims, they do not engage directly in conversations with misinformation spreaders. On the other hand, non-expert ordinary users act as eyes-on-the-ground who proactively counter misinformation – recent research has shown that 96% counter-misinformation responses are made by ordinary users. However, research also found that 2/3 times, these responses are rude and lack evidence. This work seeks to create a counter-misinformation response generation model to empower users to effectively correct misinformation. This objective is challenging due to the absence of datasets containing ground-truth of ideal counter-misinformation responses, and the lack of models that can generate responses backed by communication theories. In this work, we create two novel datasets of misinformation and counter-misinformation response pairs from in-the-wild social media and crowdsourcing from college-educated students. We annotate the collected data to distinguish poor from ideal responses that are factual, polite, and refute misinformation. We propose MisinfoCorrect, a reinforcement learning-based framework that learns to generate counter-misinformation responses for an input misinformation post. The model rewards the generator to increase the politeness, factuality, and refutation attitude while retaining text fluency and relevancy. Quantitative and qualitative evaluation shows that our model outperforms several baselines by generating high-quality counter-responses. This work illustrates the promise of generative text models for social good – here, to help create a safe and reliable information ecosystem. The code and data is accessible on https://github.com/claws-lab/MisinfoCorrect.

MassNE: Exploring Higher-Order Interactions with Marginal Effect for Massive Battle Outcome Prediction

In online games, predicting massive battle outcomes is a fundamental task of many applications, such as team optimization and tactical formulation. Existing works do not pay adequate attention to the massive battle. They either seek to evaluate individuals in isolation or mine simple pair-wise interactions between individuals, neither of which effectively captures the intricate interactions between massive units (e.g., individuals). Furthermore, as the team size increases, the phenomenon of diminishing marginal utility of units emerges. Such a diminishing pattern is rarely noticed in previous work, and how to capture it from data remains a challenge. To this end, we propose a novel Massive battle outcome predictor with margiNal Effect modules, namely MassNE, which comprehensively incorporates individual effects, cooperation effects (i.e., intra-team interactions) and suppression effects (i.e., inter-team interactions) for predicting battle outcomes. Specifically, we design marginal effect modules to learn how units’ marginal utility changing respect to their number, where the monotonicity assumption is applied to ensure rationality. In addition, we evaluate the current classical models and provide mathematical proofs that MassNE is able to generalize several earlier works in massive settings. Massive battle datasets generated by StarCraft II APIs are adopted to evaluate the performances of MassNE. Extensive experiments empirically demonstrate the effectiveness of MassNE, and MassNE can reveal reasonable cooperation effects, suppression effects, and marginal utilities of combat units from the data.

Large-Scale Analysis of New Employee Network Dynamics

The COVID-19 pandemic has accelerated digital transformations across industries, but also introduced new challenges into workplaces, including the difficulties of effectively socializing with colleagues when working remotely. This challenge is exacerbated for new employees who need to develop workplace networks from the outset. In this paper, by analyzing a large-scale telemetry dataset of more than 10,000 Microsoft employees who joined the company in the first three months of 2022, we describe how new employees interact and telecommute with their colleagues during their “onboarding” period. Our results reveal that although new hires are gradually expanding networks over time, there still exists significant gaps between their network statistics and those of tenured employees even after the six-month onboarding phase. We also observe that heterogeneity exists among new employees in how their networks change over time, where employees whose job tasks do not necessarily require extensive and diverse connections could be at a disadvantaged position in this onboarding process. By investigating how web-based people recommendations in organizational knowledge base facilitate new employees naturally expand their networks, we also demonstrate the potential of web-based applications for addressing the aforementioned socialization challenges. Altogether, our findings provide insights on new employee network dynamics in remote and hybrid work environments, which may help guide organizational leaders and web application developers on quantifying and improving the socialization experiences of new employees in digital workplaces.

A First Look at Public Service Websites from the Affordability Lens

Public service websites act as official gateways to services provided by governments. Many of these websites are essential for citizens to receive reliable information and online government services. However, the lack of affordability of mobile broadband services in many developing countries and the rising complexity of websites create barriers for citizens in accessing these government websites. This paper presents the first large-scale analysis of the affordability of public service websites in developing countries. We do this by collecting a corpus of 1900 public service websites, including public websites from nine developing countries and for comparison websites from nine developed countries. Our investigation is driven by website complexity analysis as well as evaluation through a recently proposed affordability index. Our analysis reveals that, in general, public service websites in developing countries do not meet the affordability target set by the UN’s Broadband Commission. However, we show that several countries can be brought within or closer to the affordability target by implementing webpage optimizations to reduce page sizes. We also discuss policy interventions that can help make access to public service website more affordable.

Propaganda Política Pagada: Exploring U.S. Political Facebook Ads en Español

In 2021, the U.S. Hispanic population totaled 62.5 million people, 68% of whom spoke Spanish in their homes. To date, it is unclear which political advertisers address this audience in their preferred language, and whether they do so differently than for English-speaking audiences. In this work, we study differences between political Facebook ads in English and Spanish during 2020, the latest U.S. presidential election. Political advertisers spent $ 1.48 B in English, but only $ 28.8 M in Spanish, disproportionately little compared to the share of Spanish speakers in the population. We further find a lower proportion of election-related advertisers (which additionally are more liberal-leaning than in the English set), and a higher proportion of government agencies in the set of Spanish ads. We perform multilingual topic classification, finding that the most common ad topics in English were also present in Spanish, but to a different extent, and with a different composition of advertisers. Thus, Spanish speakers are served different types of ads from different types of advertisers than English speakers, and in lower amounts; these results raise the question of whether political communication through Facebook ads may be inequitable and effectively disadvantaging the sizeable minority of Spanish speakers in the U.S. population.

Migration Reframed? A multilingual analysis on the stance shift in Europe during the Ukrainian crisis

The war in Ukraine seems to have positively changed the attitude toward the critical societal topic of migration in Europe – at least towards refugees from Ukraine. We investigate whether this impression is substantiated by how the topic is reflected in online news and social media, thus linking the representation of the issue on the Web to its perception in society. For this purpose, we combine and adapt leading-edge automatic text processing for a novel multilingual stance detection approach. Starting from 5.5M Twitter posts published by 565 European news outlets in one year, beginning September 2021, plus replies, we perform a multilingual analysis of migration-related media coverage and associated social media interaction for Europe and selected European countries.

The results of our analysis show that there is actually a reframing of the discussion illustrated by the terminology change, e.g., from “migrant” to “refugee”, often even accentuated with phrases such as “real refugees”. However, concerning a stance shift in public perception, the picture is more diverse than expected. All analyzed cases show a noticeable temporal stance shift around the start of the war in Ukraine. Still, there are apparent national differences in the size and stability of this shift.

Who Funds Misinformation? A Systematic Analysis of the Ad-related Profit Routines of Fake News Sites

Fake news is an age-old phenomenon, widely assumed to be associated with political propaganda published to sway public opinion. Yet, with the growth of social media, it has become a lucrative business for Web publishers. Despite many studies performed and countermeasures proposed, unreliable news sites have increased in the last years their share of engagement among the top performing news sources. Stifling fake news impact depends on our efforts in limiting the (economic) incentives of fake news producers.

In this paper, we aim at enhancing the transparency around these exact incentives, and explore: Who supports the existence of fake news websites via paid ads, either as an advertiser or an ad seller? Who owns these websites and what other Web business are they into? We are the first to systematize the auditing process of fake news revenue flows. We identify the companies that advertise in fake news websites and the intermediary companies responsible for facilitating those ad revenues. We study more than 2,400 popular news websites and show that well-known ad networks, such as Google and IndexExchange, have a direct advertising relation with more than 40% of fake news websites. Using a graph clustering approach on 114.5K sites, we show that entities who own fake news sites, also operate other types of websites pointing to the fact that owning a fake news website is part of a broader business operation.

Evidence of Demographic rather than Ideological Segregation in News Discussion on Reddit

We evaluate homophily and heterophily among ideological and demographic groups in a typical opinion formation context: online discussions of current news. We analyze user interactions across five years in the r/news community on Reddit, one of the most visited websites in the United States. Then, we estimate demographic and ideological attributes of these users. Thanks to a comparison with a carefully-crafted network null model, we establish which pairs of attributes foster interactions and which ones inhibit them.

Individuals prefer to engage with the opposite ideological side, which contradicts the echo chamber narrative. Instead, demographic groups are homophilic, as individuals tend to interact within their own group—even in an online setting where such attributes are not directly observable. In particular, we observe age and income segregation consistently across years: users tend to avoid interactions when belonging to different groups. These results persist after controlling for the degree of interest by each demographic group in different news topics. Our findings align with the theory that affective polarization—the difficulty in socializing across political boundaries—is more connected with an increasingly divided society, rather than ideological echo chambers on social media. We publicly release our anonymized data set and all the code to reproduce our results.1

Online Advertising in Ukraine and Russia During the 2022 Russian Invasion

Online ads are a major source of information on the web. The mass reach of online advertising is often leveraged for information dissemination, at times with an objective to influence public opinion (e.g., election misinformation). We hypothesized that online advertising, due to its reach and potential, might have been used to spread information around the 2022 Russian invasion of Ukraine. Thus, to understand the online ad ecosystem during this conflict, we conducted a five-month long large-scale measurement study of online advertising in Ukraine, Russia, and the US. We studied advertising trends of ad platforms that delivered ads in Ukraine, Russia, and the US and conducted an in-depth qualitative analysis of the conflict-related ad content. We found that prominent US-based advertisers continued to support Russian websites, and a portion of online ads were used to spread conflict-related information, including protesting the invasion, and spreading awareness, which might have otherwise potentially been censored in Russia.

Understanding the Behaviors of Toxic Accounts on Reddit

Toxic comments are the top form of hate and harassment experienced online. While many studies have investigated the types of toxic comments posted online, the effects that such content has on people, and the impact of potential defenses, no study has captured the behaviors of the accounts that post toxic comments or how such attacks are operationalized. In this paper, we present a measurement study of 929K accounts that post toxic comments on Reddit over an 18 month period. Combined, these accounts posted over 14 million toxic comments that encompass insults, identity attacks, threats of violence, and sexual harassment. We explore the impact that these accounts have on Reddit, the targeting strategies that abusive accounts adopt, and the distinct patterns that distinguish classes of abusive accounts. Our analysis informs the nuanced interventions needed to curb unwanted toxic behaviors online.

Online Reviews Are Leading Indicators of Changes in K-12 School Attributes

School rating websites are increasingly used by parents to assess the quality and fit of U.S. K-12 schools for their children. These online reviews often contain detailed descriptions of a school’s strengths and weaknesses, which both reflect and inform perceptions of a school. Existing work on these text reviews has focused on finding words or themes that underlie these perceptions, but has stopped short of using the textual reviews as leading indicators of school performance. In this paper, we investigate to what extent the language used in online reviews of a school is predictive of changes in the attributes of that school, such as its socio-economic makeup and student test scores. Using over 300K reviews of 70K U.S. schools from a popular ratings website, we apply language processing models to predict whether schools will significantly increase or decrease in an attribute of interest over a future time horizon. We find that using the text improves predictive performance significantly over a baseline model that does not include text but only the historical time-series of the indicators themselves, suggesting that the review text carries predictive power. A qualitative analysis of the most predictive terms and phrases used in the text reviews indicates a number of topics that serve as leading indicators, such as diversity, changes in school leadership, a focus on testing, and school safety.

SeqCare: Sequential Training with External Medical Knowledge Graph for Diagnosis Prediction in Healthcare Data

Deep learning techniques are capable of capturing complex input-output relationships, and have been widely applied to the diagnosis prediction task based on web-based patient electronic health records (EHR) data. To improve the prediction and interpretability of pure data-driven deep learning with only a limited amount of labeled data, a pervasive trend is to assist the model training with knowledge priors from online medical knowledge graphs. However, they marginally investigated the label imbalance and the task-irrelevant noise in the external knowledge graph. The imbalanced label distribution would bias the learning and knowledge extraction towards the majority categories. The task-irrelevant noise introduces extra uncertainty to the model performance. To this end, aiming at by-passing the bias-variance trade-off dilemma, we introduce a new sequential learning framework, dubbed SeqCare, for diagnosis prediction with online medical knowledge graphs. Concretely, in the first step, SeqCare learns a bias-reduced space through a self-supervised graph contrastive learning task. Secondly, SeqCare reduces the learning uncertainty by refining the supervision signal and the graph structure of the knowledge graph simultaneously. Lastly, SeqCare trains the model in the bias-variance reduced space with a self-distillation to further filter out irrelevant information in the data. Experimental evaluations on two real-world datasets show that SeqCare outperforms state-of-the-art approaches. Case studies exemplify the interpretability of SeqCare. Moreover, the medical findings discovered by SeqCare are consistent with experts and medical literature.

Longitudinal Assessment of Reference Quality on Wikipedia

Wikipedia plays a crucial role in the integrity of the Web. This work analyzes the reliability of this global encyclopedia through the lens of its references. We operationalize the notion of reference quality by defining reference need (RN), i.e., the percentage of sentences missing a citation, and reference risk (RR), i.e., the proportion of non-authoritative references. We release Citation Detective, a tool for automatically calculating the RN score, and discover that the RN score has dropped by 20 percent point in the last decade, with more than half of verifiable statements now accompanying references. The RR score has remained below 1% over the years as a result of the efforts of the community to eliminate unreliable references. We propose pairing novice and experienced editors on the same Wikipedia article as a strategy to enhance reference quality. Our quasi-experiment indicates that such a co-editing experience can result in a lasting advantage in identifying unreliable sources in future edits. As Wikipedia is frequently used as the ground truth for numerous Web applications, our findings and suggestions on its reliability can have a far-reaching impact. We discuss the possibility of other Web services adopting Wiki-style user collaboration to eliminate unreliable content.

Gateway Entities in Problematic Trajectories

Social media platforms like Facebook and YouTube connect people with communities that reflect their own values and experiences. People discover new communities either organically or through algorithmic recommendations based on their interests and preferences. We study online journeys users take through these communities, focusing particularly on ones that may lead to problematic outcomes. In particular, we propose and explore the concept of gateways, namely, entities associated with a higher likelihood of subsequent engagement with problematic content. We show, via a real-world application on Facebook groups, that a simple definition of gateway entities can be leveraged to reduce exposure to problematic content by 1% without any adverse impact on user engagement metrics. Motivated by this finding, we propose several formal definitions of gateways, via both frequentist and survival analysis methods, and evaluate their efficacy in predicting user behavior through offline experiments. Frequentist, duration-insensitive methods predict future harmful engagements with an 0.64–0.83 AUC, while survival analysis methods improve this to 0.72–0.90 AUC.

The Thin Ideology of Populist Advertising on Facebook during the 2019 EU Elections

Social media has been an important tool in the expansion of the populist message, and it is thought to have contributed to the electoral success of populist parties in the past decade. This study compares how populist parties advertised on Facebook during the 2019 European Parliamentary election. In particular, we examine commonalities and differences in which audiences they reach and on which issues they focus. By using data from Meta (previously Facebook) Ad Library, we analyze 45k ad campaigns by 39 parties, both populist and mainstream, in Germany, United Kingdom, Italy, Spain, and Poland. While populist parties represent just over 20% of the total expenditure on political ads, they account for 40% of the total impressions—most of which from Eurosceptic and far-right parties—thus hinting at a competitive advantage for populist parties on Facebook. We further find that ads posted by populist parties are more likely to reach male audiences, and sometimes much older ones. In terms of issues, populist politicians focus on monetary policy, state bureaucracy and reforms, and security, while the focus on EU and Brexit is on par with non-populist, mainstream parties. However, issue preferences are largely country-specific, thus supporting the view in political science that populism is a “thin ideology”, that does not have a universal, coherent policy agenda. This study illustrates the usefulness of publicly available advertising data for monitoring the populist outreach to, and engagement with, millions of potential voters, while outlining the limitations of currently available data.

SESSION: Systems and Infrastructure for Web, Mobile Web, and Web of Things

Beyond Fine-Tuning: Efficient and Effective Fed-Tuning for Mobile/Web Users

Fine-tuning is a typical mechanism to achieve model adaptation for mobile/web users, where a model trained by the cloud is further retrained to fit the target user task. While traditional fine-tuning has been proved effective, it only utilizes local data to achieve adaptation, failing to take advantage of the valuable knowledge from other mobile/web users. In this paper, we attempt to extend the local-user fine-tuning to multi-user fed-tuning with the help of Federated Learning (FL). Following the new paradigm, we propose EEFT, a framework aiming to achieve Efficient and Effective Fed-Tuning for mobile/web users. The key idea is to introduce lightweight but effective adaptation modules to the pre-trained model, such that we can freeze the pre-trained model and just focus on optimizing the modules to achieve cost reduction and selective task cooperation. Extensive experiments on our constructed benchmark demonstrate the effectiveness and efficiency of the proposed framework.

Unsupervised Anomaly Detection on Microservice Traces through Graph VAE

The microservice architecture is widely employed in large Internet systems. For each user request, a few of the microservices are called, and a trace is formed to record the tree-like call dependencies among microservices and the time consumption at each call node. Traces are useful in diagnosing system failures, but their complex structures make it difficult to model their patterns and detect their anomalies. In this paper, we propose a novel dual-variable graph variational autoencoder (VAE) for unsupervised anomaly detection on microservice traces. To reconstruct the time consumption of nodes, we propose a novel dispatching layer. We find that the inversion of negative log-likelihood (NLL) appears for some anomalous samples, which makes the anomaly score infeasible for anomaly detection. To address this, we point out that the NLL can be decomposed into KL-divergence and data entropy, whereas lower-dimensional anomalies can introduce an entropy gap with normal inputs. We propose three techniques to mitigate this entropy gap for trace anomaly detection: Bernoulli & Categorical Scaling, Node Count Normalization, and Gaussian Std-Limit. On five trace datasets from a top Internet company, our proposed TraceVAE achieves excellent F-scores.

Automated WebAssembly Function Purpose Identification With Semantics-Aware Analysis

WebAssembly is a recent web standard built for better performance in web applications. The standard defines a binary code format to use as a compilation target for a variety of languages, such as C, C++, and Rust. The standard also defines a text representation for readability, although, WebAssembly modules are difficult to interpret by human readers, regardless of their experience level. This makes it difficult to understand and maintain any existing WebAssembly code. As a result, third-party WebAssembly modules need to be implicitly trusted by developers as verifying the functionality themselves may not be feasible.

To this end, we construct WASPur, a tool to automatically identify the purposes of WebAssembly functions. To build this tool, we first construct an extensive collection of WebAssembly samples that represent the state of WebAssembly. Second, we analyze the dataset and identify the diverse use cases of the collected WebAssembly modules. We leverage the dataset of WebAssembly modules to construct semantics-aware intermediate representations (IR) of the functions in the modules. We encode the function IR for use in a machine learning classifier, and we find that this classifier can predict the similarity of a given function against known named functions with an accuracy rate of 88.07%. We hope our tool will enable inspection of optimized and minified WebAssembly modules that remove function names and most other semantic identifiers.

FedEdge: Accelerating Edge-Assisted Federated Learning

Federated learning (FL) has been widely acknowledged as a promising solution to training machine learning (ML) model training with privacy preservation. To reduce the traffic overheads incurred by FL systems, edge servers have been included between clients and the parameter server to aggregate clients’ local models. Recent studies on this edge-assisted hierarchical FL scheme have focused on ensuring or accelerating model convergence by coping with various factors, e.g., uncertain network conditions, unreliable clients, heterogeneous compute resources, etc. This paper presents our three new discoveries of the edge-assisted hierarchical FL scheme: 1) it wastes significant time during its two-phase training rounds; 2) it does not recognize or utilize model diversity when producing a global model; and 3) it is vulnerable to model poisoning attacks. To overcome these drawbacks, we propose FedEdge, a novel edge-assisted hierarchical FL scheme that accelerates model training with asynchronous local federated training and adaptive model aggregation. Extensive experiments are conducted on two widely-used public datasets. The results demonstrate that, compared with state-of-the-art FL schemes, FedEdge accelerates model convergence by 1.14 × −3.20 ×, and improves model accuracy by 2.14% - 6.63%.

CausIL: Causal Graph for Instance Level Microservice Data

AI-based monitoring has become crucial for cloud-based services due to its scale. A common approach to AI-based monitoring is to detect causal relationships among service components and build a causal graph. Availability of domain information makes cloud systems even better suited for such causal detection approaches. In modern cloud systems, however, auto-scalers dynamically change the number of microservice instances, and a load-balancer manages the load on each instance. This poses a challenge for off-the-shelf causal structure detection techniques as they neither incorporate the system architectural domain information nor provide a way to model distributed compute across varying numbers of service instances. To address this, we develop CausIL, which detects a causal structure among service metrics by considering compute distributed across dynamic instances and incorporating domain knowledge derived from system architecture. Towards the application in cloud systems, CausIL estimates a causal graph using instance-specific variations in performance metrics, modeling multiple instances of a service as independent, conditional on system assumptions. Simulation study shows the efficacy of CausIL over baselines by improving graph estimation accuracy by ∼ 25% as measured by Structural Hamming Distance whereas the real-world dataset demonstrates CausIL’s applicability in deployment settings.

SCTAP: Supporting Scenario-Centric Trigger-Action Programming based on Software-Defined Physical Environments

The physical world we live in is accelerating digitalization with the vigorous development of Internet of Things (IoT). Following this trend, Web of Things (WoT) further enables fast and efficient creation of various applications that perceive and act on the physical world using standard Web technologies. A popular way for creating WoT applications is Trigger-Action Programming (TAP), which allows users to orchestrate the capabilities of IoT devices in the form of “if trigger, then action”. However, existing TAP approaches don’t support scenario-centric WoT applications which involve abstract modeling of physical environments and complex spatio-temporal dependencies between events and actions. In this paper, we propose an approach called SCTAP which supports Scenario-Centric Trigger-Action Programming based on software-defined physical environments. SCTAP defines a structured and conceptual representation for physical environments, which provides the required programming abstractions for WoT applications. Based on the representation, SCTAP defines a grammar for specifying scenario-centric WoT applications with spatio-temporal dependencies. Furthermore, we design a service-based architecture for SCTAP which supports the integration of device access, event perception, environment representation, and rule execution in a loosely-coupled and extensible way. We implement SCTAP as a WoT infrastructure and evaluate it with two case studies including a smart laboratory and a smart coffee house. The results confirm the usability, feasibility and efficiency of SCTAP and its implementation.

Learning Cooperative Oversubscription for Cloud by Chance-Constrained Multi-Agent Reinforcement Learning

Oversubscription is a common practice for improving cloud resource utilization. It allows the cloud service provider to sell more resources than the physical limit, assuming not all users would fully utilize the resources simultaneously. However, how to design an oversubscription policy that improves utilization while satisfying some safety constraints remains an open problem. Existing methods and industrial practices are over-conservative, ignoring the coordination of diverse resource usage patterns and probabilistic constraints. To address these two limitations, this paper formulates the oversubscription for cloud as a chance-constrained optimization problem and proposes an effective Chance-Constrained Multi-Agent Reinforcement Learning (C2MARL) method to solve this problem. Specifically, C2MARL reduces the number of constraints by considering their upper bounds and leverages a multi-agent reinforcement learning paradigm to learn a safe and optimal coordination policy. We evaluate our C2MARL on an internal cloud platform and public cloud datasets. Experiments show that our C2MARL outperforms existing methods in improving utilization () under different levels of safety constraints.

CMDiagnostor: An Ambiguity-Aware Root Cause Localization Approach Based on Call Metric Data

The availability of online services is vital as its strong relevance to revenue and user experience. To ensure online services’ availability, quickly localizing the root causes of system failures is crucial. Given the high resource consumption of traces, call metric data are widely used by existing approaches to construct call graphs in practice. However, ambiguous correspondences between upstream and downstream calls may exist and result in exploring unexpected edges in the constructed call graph. Conducting root cause localization on this graph may lead to misjudgments of real root causes. To the best of our knowledge, we are the first to investigate such ambiguity, which is overlooked in the existing literature. Inspired by the law of large numbers and the Markov properties of network traffic, we propose a regression-based method (named AmSitor) to address this problem effectively. Based on AmSitor, we propose an ambiguity-aware root cause localization approach based on Call Metric Data named CMDiagnostor, containing metric anomaly detection, ambiguity-free call graph construction, root cause exploration, and candidate root cause ranking modules. The comprehensive experimental evaluations conducted on real-world datasets show that our CMDiagnostor can outperform the state-of-the-art approaches by 14% on the top-5 hit rate. Moreover, AmSitor can also be applied to existing baseline approaches separately to improve their performances one step further. The source code is released at https://github.com/NetManAIOps/CMDiagnostor.

Visual-Aware Testing and Debugging for Web Performance Optimization

Web performance optimization services, or web performance optimizers (WPOs), play a critical role in today’s web ecosystem by improving page load speed and saving network traffic. However, WPOs are known for introducing visual distortions that disrupt the users’ web experience. Unfortunately, visual distortions are hard to analyze, test, and debug, due to their subjective measure, dynamic content, and sophisticated WPO implementations.

This paper presents Vetter, a novel and effective system that automatically tests and debugs visual distortions. Its key idea is to reason about the morphology of web pages, which describes the topological forms and scale-free geometrical structures of visual elements. Vetter efficiently calculates morphology and comparatively analyzes the morphologies of web pages before and after a WPO, which acts as a differential test oracle. Such morphology analysis enables Vetter to detect visual distortions accurately and reliably. Vetter further diagnoses the detected visual distortions to pinpoint the root causes in WPOs’ source code. This is achieved by morphological causal inference, which localizes the offending visual elements that trigger the distortion and maps them to the corresponding code. We applied Vetter to four representative WPOs. Vetter discovers 21 unknown defects responsible for 98% visual distortions; 12 of them have been confirmed and 5 have been fixed.

Demystifying Mobile Extended Reality in Web Browsers: How Far Can We Go?

Mobile extended reality (XR) has developed rapidly in recent years. Compared with the app-based XR, XR in web browsers has the advantages of being lightweight and cross-platform, providing users with a pervasive experience. Therefore, many frameworks are emerging to support the development of XR in web browsers. However, little has been known about how well these frameworks perform and how complex XR apps modern web browsers can support on mobile devices. To fill the knowledge gap, in this paper, we conduct an empirical study of mobile XR in web browsers. We select seven most popular web-based XR frameworks and investigate their runtime performance, including 3D rendering, camera capturing, and real-world understanding. We find that current frameworks have the potential to further enhance their performance by increasing GPU utilization or improving computing parallelism. Besides, for 3D scenes with good rendering performance, developers can feel free to add camera capturing with little influence on performance to support augmented reality (AR) and mixed reality (MR) applications. Based on our findings, we draw several practical implications to provide better XR support in web browsers.

Look Deep into the Microservice System Anomaly through Very Sparse Logs

Intensive monitoring and anomaly diagnosis have become a knotty problem for modern microservice architecture due to the dynamics of service dependency. While most previous studies rely heavily on ample monitoring metrics, we raise a fundamental but always neglected issue - the diagnostic metric integrity problem. This paper solves the problem by proposing MicroCU – a novel approach to diagnose microservice systems using very sparse API logs. We design a structure named dynamic causal curves to portray time-varying service dependencies and a temporal dynamics discovery algorithm based on Granger causal intervals. Our algorithm generates a smoother space of causal curves and designs the concept of causal unimodalization to calibrate the causality infidelities brought by missing metrics. Finally, a path search algorithm on dynamic causality graphs is proposed to pinpoint the root cause. Experiments on commercial system cases show that MicroCU outperforms many state-of-the-art approaches and reflects the superiorities of causal unimodalization to raw metric imputation.

FlexiFed: Personalized Federated Learning for Edge Clients with Heterogeneous Model Architectures

Mobile and Web-of-Things (WoT) devices at the network edge account for more than half of the world’s web traffic, making a great data source for various machine learning (ML) applications, particularly federated learning (FL) which offers a promising solution to privacy-preserving ML feeding on these data. FL allows edge mobile and WoT devices to train a shared global ML model under the orchestration of a central parameter server. In the real world, due to resource heterogeneity, these edge devices often train different versions of models (e.g., VGG-16 and VGG-19) or different ML models (e.g., VGG and ResNet) for the same ML task (e.g., computer vision and speech recognition). Existing FL schemes have assumed that participating edge devices share a common model architecture, and thus cannot facilitate FL across edge devices with heterogeneous ML model architectures. We explored this architecture heterogeneity challenge and found that FL can and should accommodate these edge devices to improve model accuracy and accelerate model training. This paper presents our findings and FlexiFed, a novel scheme for FL across edge devices with heterogeneous model architectures, and three model aggregation strategies for accommodating architecture heterogeneity under FlexiFed. Experiments with four widely-used ML models on four public datasets demonstrate 1) the usefulness of FlexiFed; and 2) that compared with the state-of-the-art FL scheme, FlexiFed improves model accuracy by 2.6%-9.7% and accelerates model convergence by 1.24 × -4.04 ×.

DeeProphet: Improving HTTP Adaptive Streaming for Low Latency Live Video by Meticulous Bandwidth Prediction

The performance of HTTP adaptive streaming (HAS) depends heavily on the prediction of end-to-end network bandwidth. The increasingly popular low latency live streaming (LLLS) faces greater challenges since it requires accurate, short-term bandwidth prediction, compared with VOD streaming which needs long-term bandwidth prediction and has good tolerance against prediction error. Part of the challenges comes from the fact that short-term bandwidth experiences both large abrupt changes and uncertain fluctuations. Additionally, it is hard to obtain valid bandwidth measurement samples in LLLS due to its inter-chunk and intra-chunk sending idleness. In this work, we present DeeProphet, a system for accurate bandwidth prediction in LLLS to improve the performance of HAS. DeeProphet overcomes the above challenges by collecting valid measurement samples using fine-grained TCP state information to identify the packet bursting intervals, and by combining the time series model and learning-based model to predict both large change and uncertain fluctuations. Experiment results show that DeeProphet improves the overall QoE by 17.7%-359.2% compared with state-of-the-art LLLS ABR algorithms, and reduces the median bandwidth prediction error to 2.7%.

Is IPFS Ready for Decentralized Video Streaming?

InterPlanetary File System (IPFS) is a peer-to-peer protocol for decentralized content storage and retrieval. The IPFS platform has the potential to help users evade censorship and avoid a central point of failure. IPFS is seeing increasing adoption for distributing various kinds of files, including video. However, the performance of video streaming on IPFS has not been well-studied. We conduct a measurement study with over 28,000 videos hosted on the IPFS network and find that video streaming experiences high stall rates due to relatively high Round Trip Times (RTT). Further, videos are encoded using a single static quality, because of which streaming cannot adapt to different network conditions.

A natural approach is to use adaptive bitrate (ABR) algorithms for streaming, which encode videos in multiple qualities and streams according to the throughput available. However, traditional ABR algorithms perform poorly on IPFS because the throughput cannot be estimated correctly. The main problem is that video segments can be retrieved from multiple sources, making it difficult to estimate the throughput. To overcome this issue, we have designed Telescope, an IPFS-aware ABR system. We conduct experiments on the IPFS network, where IPFS video providers are geographically distributed across the globe. Our results show that Telescope significantly improves the Quality of Experience (QoE) of videos, for a diverse set of network and cache conditions, compared to traditional ABR.

SISSI: An Architecture for Semantic Interoperable Self-Sovereign Identity-based Access Control on the Web

We present an architecture for authentication and authorization on the Web that is based on the Self-Sovereign Identity paradigm. Using our architecture, we aim to achieve semantic interoperability across different approaches to SSI. We build on the underlying RDF data model of the W3C’s recommendation for Verifiable Credentials and specify semantic access control rules using SHACL. Our communication protocol for an authorization process is based on Decentralised Identifiers and extends the Hyperledger Aries Present Proof protocol. We propose a modular architecture that allows for flexible extension, e. g., for supporting more signature schemes or Decentralised Identifier Methods. For evaluation, we implemented a Proof-of-Concept: We show that a Web-based approach to SSI outperfoms a blockchain-based approach to SSI in terms of End-to-End execution time.

Analyzing the Communication Clusters in Datacenters✱

Datacenter networks have become a critical infrastructure of our digital society and over the last years, great efforts have been made to better understand the communication patterns inside datacenters. In particular, existing empirical studies showed that datacenter traffic typically features much temporal and spatial structure, and that at any given time, some communication pairs interact much more frequently than others. This paper generalizes this study to communication groups and analyzes how clustered the datacenter traffic is, and how stable these clusters are over time. To this end, we propose a methodology which revolves around a biclustering approach, allowing us to identify groups of racks and servers which communicate frequently over the network. In particular, we consider communication patterns occurring in three different Facebook datacenters: a Web cluster consisting of web servers serving web traffic, a Database cluster which mainly consists of MySQL servers, and a Hadoop cluster. Interestingly, we find that in all three clusters, small groups of racks and servers can produce a large fraction of the network traffic, and we can determine these groups even when considering short snapshots of network traffic. We also show empirically that these clusters are fairly stable across time. Our insights on the size and stability of communication clusters hence uncover an interesting potential for resource optimizations in datacenter infrastructures.

PipeEdge: A Trusted Pipelining Collaborative Edge Training based on Blockchain

Powered by the massive data generated by the blossom of mobile and Web-of-Things (WoT) devices, Deep Neural Networks (DNNs) have developed both in accuracy and size in recent years. Conventional cloud-based DNN training incurs rapidly-increasing data and model transmission overheads as well as privacy issues. Mobile edge computing (MEC) provides a promising solution by facilitating DNN model training on edge servers at the network edge. However, edge servers often suffer from constrained resources and need to collaborate on DNN training. Unfortunately, managed by different telecoms, edge servers cannot properly collaborate with each other without incentives and trust. In this paper, we introduce PipeEdge, a scheme that promotes collaborative edge training between edge servers by introducing incentives and trust based on blockchain. Under the PipeEdge scheme, edge servers can hire trustworthy workers for pipelined DNN training tasks based on model parallelism. We implement PipeEdge and evaluate it comprehensively with four different DNN models. The results show that it outperforms state-of-the-art schemes by up to 173.98% with negligible overheads.

To Store or Not? Online Data Selection for Federated Learning with Limited Storage

Machine learning models have been deployed in mobile networks to deal with massive data from different layers to enable automated network management and intelligence on devices. To overcome high communication cost and severe privacy concerns of centralized machine learning, federated learning (FL) has been proposed to achieve distributed machine learning among networked devices. While the computation and communication limitation has been widely studied, the impact of on-device storage on the performance of FL is still not explored. Without an effective data selection policy to filter the massive streaming data on devices, classical FL can suffer from much longer model training time (4 ×) and significant inference accuracy reduction (7%), observed in our experiments. In this work, we take the first step to consider the online data selection for FL with limited on-device storage. We first define a new data valuation metric for data evaluation and selection in FL with theoretical guarantees for speeding up model convergence and enhancing final model accuracy, simultaneously. We further design ODE, a framework of Online Data sElection for FL, to coordinate networked devices to store valuable data samples. Experimental results on one industrial dataset and three public datasets show the remarkable advantages of ODE over the state-of-the-art approaches. Particularly, on the industrial dataset, ODE achieves as high as 2.5 × speedup of training time and 6% increase in inference accuracy, and is robust to various factors in practical environments.

ELASTIC: Edge Workload Forecasting based on Collaborative Cloud-Edge Deep Learning

With the rapid development of edge computing in the post-COVID19 pandemic period, precise workload forecasting is considered the basis for making full use of the edge limited resources, and both edge service providers (ESPs) and edge service consumers (ESCs) can benefit significantly from it. Existing paradigms of workload forecasting (i.e., edge-only or cloud-only) are improper, due to failing to consider the inter-site correlations and might suffer from significant data transmission delays. With the increasing adoption of edge platforms by web services, it is critical to balance both accuracy and efficiency in workload forecasting. In this paper, we propose ELASTIC, which is the first study that leverages a cloud-edge collaborative paradigm for edge workload forecasting with multi-view graphs. Specifically, at the global stage, we design a learnable aggregation layer on each edge site to reduce the time consumption while capturing the inter-site correlation. Additionally, at the local stage, we design a disaggregation layer combining both the intra-site correlation and inter-site correlation to improve the prediction accuracy. Extensive experiments on realistic edge workload datasets collected from China’s largest edge service provider show that ELASTIC outperforms state-of-the-art methods, decreases time consumption, and reduces communication cost.

DDPC: Automated Data-Driven Power-Performance Controller Design on-the-fly for Latency-sensitive Web Services

Traditional power reduction techniques such as DVFS or RAPL are challenging to use with web services because they significantly affect the services’ latency and throughput. Previous work suggested the use of controllers based on control theory or machine learning to reduce performance degradation under constrained power. However, generating these controllers is challenging as every web service applications running in a data center requires a power-performance model and a fine-tuned controller. In this paper, we present DDPC, a system for autonomic data-driven controller generation for power-latency management. DDPC automates the process of designing and deploying controllers for dynamic power allocation to manage the power-performance trade-offs for latency-sensitive web applications such as a social network. For each application, DDPC uses system identification techniques to learn an adaptive power-performance model that captures the application’s power-latency trade-offs which is then used to generate and deploy a Proportional-Integral (PI) power controller with gain-scheduling to dynamically manage the power allocation to the server running application using RAPL. We evaluate DDPC with two realistic latency-sensitive web applications under varying load scenarios. Our results show that DDPC is capable of autonomically generating and deploying controllers within a few minutes reducing the active power allocation of a web-server by more than 50% compared to state-of-the-art techniques while maintaining the latency well below the target of the application.

DUET: A Tuning-Free Device-Cloud Collaborative Parameters Generation Framework for Efficient Device Model Generalization

Device Model Generalization (DMG) is a practical yet under-investigated research topic for on-device machine learning applications. It aims to improve the generalization ability of pre-trained models when deployed on resource-constrained devices, such as improving the performance of pre-trained cloud models on smart mobiles. While quite a lot of works have investigated the data distribution shift across clouds and devices, most of them focus on model fine-tuning on personalized data for individual devices to facilitate DMG. Despite their promising, these approaches require on-device re-training, which is practically infeasible due to the overfitting problem and high time delay when performing gradient calculation on real-time data. In this paper, we argue that the computational cost brought by fine-tuning can be rather unnecessary. We consequently present a novel perspective to improving DMG without increasing computational cost, i.e., device-specific parameter generation which directly maps data distribution to parameters. Specifically, we propose an efficient Device-cloUd collaborative parametErs generaTion framework (DUET). DUET is deployed on a powerful cloud server that only requires the low cost of forwarding propagation and low time delay of data transmission between the device and the cloud. By doing so, DUET can rehearse the device-specific model weight realizations conditioned on the personalized real-time data for an individual device. Importantly, our DUET elegantly connects the cloud and device as a “duet” collaboration, frees the DMG from fine-tuning, and enables a faster and more accurate DMG paradigm. We conduct an extensive experimental study of DUET on three public datasets, and the experimental results confirm our framework’s effectiveness and generalisability for different DMG tasks.

Detecting Socially Abnormal Highway Driving Behaviors via Recurrent Graph Attention Networks

With the rapid development of Internet of Things technologies, the next generation traffic monitoring infrastructures are connected via the web, to aid traffic data collection and intelligent traffic management. One of the most important tasks in traffic is anomaly detection, since abnormal drivers can reduce traffic efficiency and cause safety issues. This work focuses on detecting abnormal driving behaviors from trajectories produced by highway video surveillance systems. Most of the current abnormal driving behavior detection methods focus on a limited category of abnormal behaviors that deal with a single vehicle without considering vehicular interactions. In this work, we consider the problem of detecting a variety of socially abnormal driving behaviors, i.e., behaviors that do not conform to the behavior of other nearby drivers. This task is complicated by the variety of vehicular interactions and the spatial-temporal varying nature of highway traffic. To solve this problem, we propose an autoencoder with a Recurrent Graph Attention Network that can capture the highway driving behaviors contextualized on the surrounding cars, and detect anomalies that deviate from learned patterns. Our model is scalable to large freeways with thousands of cars. Experiments on data generated from traffic simulation software show that our model is the only one that can spot the exact vehicle conducting socially abnormal behaviors, among the state-of-the-art anomaly detection models. We further show the performance on real world HighD traffic dataset, where our model detects vehicles that violate the local driving norms.

GROUP: An End-to-end Multi-step-ahead Workload Prediction Approach Focusing on Workload Group Behavior

Accurately forecasting workloads can enable web service providers to achieve proactive runtime management for applications and ensure service quality and cost efficiency. For cloud-native applications, multiple containers collaborate to handle user requests, making each container’s workload changes influenced by workload group behavior. However, existing approaches mainly analyze the individual changes of each container and do not explicitly model the workload group evolution of containers, resulting in sub-optimal results. Therefore, we propose a workload prediction method, GROUP, which implements the shifts of workload prediction focus from individual to group, workload group behavior representation from data similarity to data correlation, and workload group behavior evolution from implicit modeling to explicit modeling. First, we model the workload group behavior and its evolution from multiple perspectives. Second, we propose a container correlation calculation algorithm that considers static and dynamic container information to represent the workload group behavior. Third, we propose an end-to-end multi-step-ahead prediction method that explicitly portrays the complex relationship between the evolution of workload group behavior and the workload changes of each container. Lastly, enough experiments on public datasets show the advantages of GROUP, which provides an effective solution to achieve workload prediction for cloud-native applications.

Will Admins Cope? Decentralized Moderation in the Fediverse

As an alternative to Twitter and other centralized social networks, the Fediverse is growing in popularity. The recent, and polemical, takeover of Twitter by Elon Musk has exacerbated this trend. The Fediverse includes a growing number of decentralized social networks, such as Pleroma or Mastodon, that share the same subscription protocol (ActivityPub). Each of these decentralized social networks is composed of independent instances that are run by different administrators. Users, however, can interact with other users across the Fediverse regardless of the instance they are signed up to. The growing user base of the Fediverse creates key challenges for the administrators, who may experience a growing burden. In this paper, we explore how large that overhead is, and whether there are solutions to alleviate the burden. We study the overhead of moderation on the administrators. We observe a diversity of administrator strategies, with evidence that administrators on larger instances struggle to find sufficient resources. We then propose a tool, WatchGen, to semi-automate the process.

BiSR: Bidirectionally Optimized Super-Resolution for Mobile Video Streaming

The user experience of mobile web video streaming is often impacted by insufficient and dynamic network bandwidth. In this paper, we design Bidirectionally Optimized Super-Resolution (BiSR) to improve the quality of experience (QoE) for mobile web users under limited bandwidth. BiSR exploits a deep neural network (DNN)-based model to super-resolve key frames efficiently without changing the inter-frame spatial-temporal information. We then propose a downscaling DNN and a mobile-specific optimized lightweight super-resolution DNN to enhance the performance. Finally, a novel reinforcement learning-based adaptive bitrate (ABR) algorithm is proposed to verify the performance of BiSR on real network traces. Our evaluation, using a full system implementation, shows that BiSR saves 26% of bitrate compared to the traditional H.264 codec and improves the SSIM of video by 3.7% compared to the prior state-of-the-art. Overall, BiSR enhances the user-perceived quality of experience by up to 30.6%.

Are Mobile Advertisements in Compliance with App’s Age Group?

As smartphones and mobile apps permeate every aspect of people’s lives, children are accessing mobile devices at an increasingly younger age. The inescapable exposure of advertisements in mobile apps to children has grown alarmingly. Mobile advertisements are placed by advertisers and subsequently distributed by ad SDKs, under the rare control of app developers and app markets’ content ratings. Indeed, content that is objectionable and harmful to children’s mental health has been reported to appear in advertising, such as pornography. However, few studies have yet concentrated on automatically and comprehensively identifying such kid-unsuitable mobile advertising. In this paper, we first characterize the regulations for mobile ads relating to children. We then propose our novel automated dynamic analysis framework, named AdRambler, that attempts to collect ad content throughout the lifespan of mobile ads and identify their inappropriateness for child app users. Using AdRambler, we conduct a large-scale (25,000 mobile apps) empirical investigation and reveal the non-incidental presence of inappropriate ads in apps with child-included target audiences. We collected 11,270 ad views and identified 1,289 ad violations (from 775 apps) of child user regulations, with roughly half of the app promotions not in compliance with host apps’ content ratings. Our finding indicates that even certified ad SDKs could still propagate inappropriate advertisements. We further delve into the question of accountability for the presence of inappropriate advertising and provide concrete suggestions for all stakeholders to take action for the benefit of children.

EdgeMove: Pipelining Device-Edge Model Training for Mobile Intelligence

Training machine learning (ML) models on mobile and Web-of-Things (WoT) has been widely acknowledged and employed as a promising solution to privacy-preserving ML. However, these end-devices often suffer from constrained resources and fail to accommodate increasingly large ML models that crave great computation power. Offloading ML models partially to the cloud for training strikes a trade-off between privacy preservation and resource requirements. However, device-cloud training creates communication overheads that delay model training tremendously. This paper presents EdgeMove, the first device-edge training scheme that enables fast pipelined model training across edge devices and edge servers. It employs probing-based mechanisms to tackle the new challenges raised by device-edge training. Before training begins, it probes nearby edge servers’ training performance and bootstraps model training by constructing a training pipeline with an approximate model partitioning. During the training process, EdgeMove accommodates user mobility and system dynamics by probing nearby edge servers’ training performance adaptively and adapting the training pipeline proactively. Extensive experiments are conducted with two popular DNN models trained on four datasets for three ML tasks. The results demonstrate that EdgeMove achieves a 1.3 × -2.1 × speedup over the state-of-the-art scheme.

HTTP Steady Connections for Robust Web Acceleration

HTTP’s intrinsic request-and-response traffic pattern makes most web servers often idle, leaving a potential to accelerate page loads. We present the notion of HTTP steady connections, which fully utilizes the server’s available network bandwidth during a page load using the promising HTTP/3 server push, transforming the intermittent workload of loading a page into a more steady one. To construct a proper server push policy to achieve this, we separate the structure of a page, which is a relatively static factor, from the page load environments including client and network characteristics, which are generally dynamic and unknown to servers. We formulate a deadline-based sequencing problem using a page load model with dependency graphs and design a feedback-based reprioritization mechanism within HTTP server push to reactively match client progress robustly. Experiments with a prototype and a wide range of real-world pages show that HTTP steady connections significantly improve web page loads compared with state-of-the-art accelerators, even under packet losses and without any prior knowledge of network environments.

SESSION: Search

Bipartite Graph Convolutional Hashing for Effective and Efficient Top-N Search in Hamming Space

Searching on bipartite graphs is basal and versatile to many real-world Web applications, e.g., online recommendation, database retrieval, and query-document searching. Given a query node, the conventional approaches rely on the similarity matching with the vectorized node embeddings in the continuous Euclidean space. To efficiently manage intensive similarity computation, developing hashing techniques for graph-structured data has recently become an emerging research direction. Despite the retrieval efficiency in Hamming space, prior work is however confronted with catastrophic performance decay. In this work, we investigate the problem of hashing with Graph Convolutional Network on bipartite graphs for effective Top-N search. We propose an end-to-end Bipartite Graph Convolutional Hashing approach, namely BGCH, which consists of three novel and effective modules: (1) adaptive graph convolutional hashing, (2) latent feature dispersion, and (3) Fourier serialized gradient estimation. Specifically, the former two modules achieve the substantial retention of the structural information against the inevitable information loss in hash encoding; the last module develops Fourier Series decomposition to the hashing function in the frequency domain mainly for more accurate gradient estimation. The extensive experiments on six real-world datasets not only show the performance superiority over the competing hashing-based counterparts, but also demonstrate the effectiveness of all proposed model components contained therein.

Beyond Two-Tower: Attribute Guided Representation Learning for Candidate Retrieval

Candidate retrieval is a key part of the modern search engines whose goal is to find candidate items that are semantically related to the query from a large item pool. The core difference against the later ranking stage is the requirement of low latency. Hence, two-tower structure with two parallel yet independent encoder for both query and item is prevalent in many systems. In these efforts, the semantic information of a query and a candidate item is fed into the corresponding encoder and then use their representations for retrieval. With the popularity of pre-trained semantic models, the state-of-the-art for semantic retrieval tasks has achieved the significant performance gain.

However, the capacity of learning relevance signals is still limited by the isolation between the query and the item. The interaction-based modeling between the query and the item has been widely validated to be useful for the ranking stage, where more computation cost is affordable. Here, we are quite initerested in an demanding question: how to exploiting query-item interaction-based learning to enhance candidate retrieval and still maintain the low computation cost. Note that an item usually contain various heteorgeneous attributes which could help us understand the item characteristics more precisely. To this end, we propose a novel attribute guided representation learning framework (named AGREE) to enhance the candidate retrieval by exploiting query-attribute relevance. The key idea is to couple the query and item representation learning together during the training phase, but also enable easy decoupling for efficient inference. Specifically, we introduce an attribute fusion layer in the item side to identify most relevant item features for item representation. On the query side, an attribute-aware learning process is introduced to better infer the search intent also from these attributes. After model training, we then decouple the attribute information away from the query encoder, which guarantees the low latency for the inference phase. Extensive experiments over two real-world large-scale datasets demonstrate the superiority of the proposed AGREE against several state-of-the-art technical alternatives. Further online A/B test from AliPay search servise also show that AGREE achieves substantial performance gain over four business metrics. Currently, the proposed AGREE has been deployed online in AliPay for serving major traffic.

Improving Content Retrievability in Search with Controllable Query Generation

An important goal of online platforms is to enable content discovery, i.e. allow users to find a catalog entity they were not familiar with. A pre-requisite to discover an entity, e.g. a book, with a search engine is that the entity is retrievable, i.e. there are queries for which the system will surface such entity in the top results. However, machine-learned search engines have a high retrievability bias, where the majority of the queries return the same entities. This happens partly due to the predominance of narrow intent queries, where users create queries using the title of an already known entity, e.g. in book search “harry potter”. The amount of broad queries where users want to discover new entities, e.g. in music search “chill lyrical electronica with an atmospheric feeling to it”, and have a higher tolerance to what they might find, is small in comparison. We focus here on two factors that have a negative impact on the retrievability of the entities (I) the training data used for dense retrieval models and (II) the distribution of narrow and broad intent queries issued in the system. We propose CtrlQGen, a method that generates queries for a chosen underlying intent—narrow or broad. We can use CtrlQGen to improve factor (I) by generating training data for dense retrieval models comprised of diverse synthetic queries. CtrlQGen can also be used to deal with factor (II) by suggesting queries with broader intents to users. Our results on datasets from the domains of music, podcasts, and books reveal that we can significantly decrease the retrievability bias of a dense retrieval model when using CtrlQGen. First, by using the generated queries as training data for dense models we make 9% of the entities retrievable—go from zero to non-zero retrievability. Second, by suggesting broader queries to users, we can make 12% of the entities retrievable in the best case.

Learning Denoised and Interpretable Session Representation for Conversational Search

Conversational search supports multi-turn user-system interactions to solve complex information needs. Compared with the traditional single-turn ad-hoc search, conversational search faces a more complex search intent understanding problem because a conversational search session is much longer and contains many noisy tokens. However, existing conversational dense retrieval solutions simply fine-tune the pre-trained ad-hoc query encoder on limited conversational search data, which are hard to achieve satisfactory performance in such a complex conversational search scenario. Meanwhile, the learned latent representation also lacks interpretability that people cannot perceive how the model understands the session. To tackle the above drawbacks, we propose a sparse Lexical-based Conversational REtriever (LeCoRE), which extends the SPLADE model with two well-matched multi-level denoising methods uniformly based on knowledge distillation and external query rewrites to generate denoised and interpretable lexical session representation. Extensive experiments on four public conversational search datasets in both normal and zero-shot evaluation settings demonstrate the strong performance of LeCoRE towards more effective and interpretable conversational search.

LED: Lexicon-Enlightened Dense Retriever for Large-Scale Retrieval

Retrieval models based on dense representations in semantic space have become an indispensable branch for first-stage retrieval. These retrievers benefit from surging advances in representation learning towards compressive global sequence-level embeddings. However, they are prone to overlook local salient phrases and entity mentions in texts, which usually play pivot roles in first-stage retrieval. To mitigate this weakness, we propose to make a dense retriever align a well-performing lexicon-aware representation model. The alignment is achieved by weakened knowledge distillations to enlighten the retriever via two aspects – 1) a lexicon-augmented contrastive objective to challenge the dense encoder and 2) a pair-wise rank-consistent regularization to make the dense model’s behavior incline to the other. We evaluate our model on three public benchmarks, which shows that with a comparable lexicon-aware retriever as the teacher, our proposed dense one can bring consistent and significant improvements, and even outdo its teacher. In addition, we show our lexicon-aware distillation strategies are compatible with the standard ranker distillation, which can further lift state-of-the-art performance.1

RL-MPCA: A Reinforcement Learning Based Multi-Phase Computation Allocation Approach for Recommender Systems

Recommender systems aim to recommend the most suitable items to users from a large number of candidates. Their computation cost grows as the number of user requests and the complexity of services (or models) increases. Under the limitation of computation resources (CRs), how to make a trade-off between computation cost and business revenue becomes an essential question. The existing studies focus on dynamically allocating CRs in queue truncation scenarios (i.e., allocating the size of candidates), and formulate the CR allocation problem as an optimization problem with constraints. Some of them focus on single-phase CR allocation, and others focus on multi-phase CR allocation but introduce some assumptions about queue truncation scenarios. However, these assumptions do not hold in other scenarios, such as retrieval channel selection and prediction model selection. Moreover, existing studies ignore the state transition process of requests between different phases, limiting the effectiveness of their approaches.

This paper proposes a Reinforcement Learning (RL) based Multi-Phase Computation Allocation approach (RL-MPCA), which aims to maximize the total business revenue under the limitation of CRs. RL-MPCA formulates the CR allocation problem as a Weakly Coupled MDP problem and solves it with an RL-based approach. Specifically, RL-MPCA designs a novel deep Q-network to adapt to various CR allocation scenarios, and calibrates the Q-value by introducing multiple adaptive Lagrange multipliers (adaptive-λ) to avoid violating the global CR constraints. Finally, experiments on the offline simulation environment and online real-world recommender system validate the effectiveness of our approach.

FINGER: Fast Inference for Graph-based Approximate Nearest Neighbor Search

Approximate K-Nearest Neighbor Search (AKNNS) has now become ubiquitous in modern applications, such as a fast search procedure with two-tower deep learning models. Graph-based methods for AKNNS in particular have received great attention due to their superior performance. These methods rely on greedy graph search to traverse the data points as embedding vectors in a database. Under this greedy search scheme, we make a key observation: many distance computations do not influence search updates so that these computations can be approximated without hurting performance. As a result, we propose FINGER, a fast inference method for efficient graph search in AKNNS. FINGER approximates the distance function by estimating angles between neighboring residual vectors. The approximated distance can be used to bypass unnecessary computations for faster searches. Empirically, when it comes to speeding up the inference of HNSW, which is one of the most popular graph-based AKNNS methods, FINGER significantly outperforms existing acceleration approaches and conventional libraries by 20 to 60 across different benchmark datasets.

A Passage-Level Reading Behavior Model for Mobile Search

Reading is a vital and complex cognitive activity during users’ information-seeking process. Several studies have focused on understanding users’ reading behavior in desktop search. Their findings greatly contribute to the design of information retrieval models. However, little is known about how users read a result in mobile search, although search currently happens more frequently in mobile scenarios. In this paper, we conduct a lab-based user study to investigate users’ fine-grained reading behavior patterns in mobile search. We find that users’ reading attention allocation is strongly affected by several behavior biases, such as position and selection biases. Inspired by these findings, we propose a probabilistic generative model, the Passage-level Reading behavior Model (PRM), to model users’ reading behavior in mobile search. The PRM utilizes observable passage-level exposure and viewport duration events to infer users’ unobserved skimming event, reading event, and satisfaction perception during the reading process. Besides fitting the passage-level reading behavior, we utilize the fitted parameters of PRM to estimate the passage-level and document-level relevance. Experimental results show that PRM outperforms existing unsupervised relevance estimation models. PRM has strong interpretability and provides valuable insights into the understanding of how users seek and perceive useful information in mobile search.

Learning To Rank Resources with GNN

As the content on the Internet continues to grow, many new dynamically changing and heterogeneous sources of data constantly emerge. A conventional search engine cannot crawl and index at the same pace as the expansion of the Internet. Moreover, a large portion of the data on the Internet is not accessible to traditional search engines. Distributed Information Retrieval (DIR) is a viable solution to this as it integrates multiple shards (resources) and provides a unified access to them. Resource selection is a key component of DIR systems. There is a rich body of literature on resource selection approaches for DIR. A key limitation of the existing approaches is that they primarily use term-based statistical features and do not generally model resource-query and resource-resource relationships. In this paper, we propose a graph neural network (GNN) based approach to learning-to-rank that is capable of modeling resource-query and resource-resource relationships. Specifically, we utilize a pre-trained language model (PTLM) to obtain semantic information from queries and resources. Then, we explicitly build a heterogeneous graph to preserve structural information of query-resource relationships and employ GNN to extract structural information. In addition, the heterogeneous graph is enriched with resource-resource type of edges to further enhance the ranking accuracy. Extensive experiments on benchmark datasets show that our proposed approach is highly effective in resource selection. Our method outperforms the state-of-the-art by 6.4% to 42% on various performance metrics.

Match4Match: Enhancing Text-Video Retrieval by Maximum Flow with Minimum Cost

With the explosive growth of video and text data on the web, text-video retrieval has become a vital task for online video platforms. Recently, text-video retrieval methods based on pre-trained models have attracted a lot of attention. However, existing methods cannot effectively capture the fine-grained information in videos, and typically suffer from the hubness problem where a collection of similar videos are retrieved by a large number of different queries. In this paper, we propose Match4Match, a new text-video retrieval method based on CLIP (Contrastive Language-Image Pretraining) and graph optimization theories. To balance calculation efficiency and model accuracy, Match4Match seamlessly supports three inference modes for different application scenarios. In fast vector retrieval mode, we embed texts and videos in the same space and employ a vector retrieval engine to obtain the top K videos. In fine-grained alignment mode, our method fully utilizes the pre-trained knowledge of the CLIP model to align words with corresponding video frames, and uses the fine-grained information to compute text-video similarity more accurately. In flow-style matching mode, to alleviate the detrimental impact of the hubness problem, we model the retrieval problem as a combinatorial optimization problem and solve it using maximum flow with minimum cost algorithm. To demonstrate the effectiveness of our method, we conduct experiments on five public text-video datasets. The overall performance of our proposed method outperforms state-of-the-art methods. Additionally, we evaluate the computational efficiency of Match4Match. Benefiting from the three flexible inference modes, Match4Match can respond to a large number of query requests with low latency or achieve high recall with acceptable time consumption.

CgAT: Center-Guided Adversarial Training for Deep Hashing-Based Retrieval

Deep hashing has been extensively utilized in massive image retrieval because of its efficiency and effectiveness. However, deep hashing models are vulnerable to adversarial examples, making it essential to develop adversarial defense methods for image retrieval. Existing solutions achieved limited defense performance because of using weak adversarial samples for training and lacking discriminative optimization objectives to learn robust features. In this paper, we present a min-max based Center-guided Adversarial Training, namely CgAT, to improve the robustness of deep hashing networks through worst adversarial examples. Our key idea is to formulate a hash code (dubbed center code) as a discriminative semantic representation of the original sample, which can be used to guide the generation of the powerful adversarial example and as an accurate optimization objective for adversarial training. Specifically, we first formulate the center code as a semantically-discriminative representative of the input image content, which preserves the semantic similarity with positive samples and dissimilarity with negative examples. We prove that a mathematical formula can calculate the center code immediately. After obtaining the center codes in each optimization iteration of the deep hashing network, they are adopted to guide the adversarial training process. On the one hand, CgAT generates the worst adversarial examples as augmented data by maximizing the Hamming distance between the hash codes of the adversarial examples and the center codes. On the other hand, CgAT learns to mitigate the effects of adversarial samples by minimizing the Hamming distance to the center codes. Extensive experiments on the benchmark datasets demonstrate the effectiveness of our adversarial training algorithm in defending against adversarial attacks for deep hashing-based retrieval. Compared with the current state-of-the-art defense method, we significantly improve the defense performance by an average of 18.61%, 12.35%, and 11.56% on FLICKR-25K, NUS-WIDE, and MS-COCO, respectively. The code is available at https://github.com/xunguangwang/CgAT.

Algorithmic Vibe in Information Retrieval

When information retrieval systems return a ranked list of results in response to a query, they may be choosing from a large set of candidate results that are equally useful and relevant. This means we might be able to identify a difference between rankers A and B, where ranker A systematically prefers a certain type of relevant results. Ranker A may have this systematic difference (different “vibe”) without having systematically better or worse results according to standard information retrieval metrics. We first show that a vibe difference can exist, comparing two publicly available rankers, where the one that is trained on health-related queries will systematically prefer health-related results, even for non-health queries. We define a vibe metric that lets us see the words that a ranker prefers. We investigate the vibe of search engine clicks vs. human labels. We perform an initial study into correcting for vibe differences to make ranker A more like ranker B via changes in negative sampling during training.

Zero-shot Clarifying Question Generation for Conversational Search

A long-standing challenge for search and conversational assistants is query intention detection in ambiguous queries. Asking clarifying questions in conversational search has been widely studied and considered an effective solution to resolve query ambiguity. Existing work have explored various approaches for clarifying question ranking and generation. However, due to the lack of real conversational search data, they have to use artificial datasets for training, which limits their generalizability to real-world search scenarios. As a result, the industry has shown reluctance to implement them in reality, further suspending the availability of real conversational search interaction data. The above dilemma can be formulated as a cold start problem of clarifying question generation and conversational search in general. Furthermore, even if we do have large-scale conversational logs, it is not realistic to gather training data that can comprehensively cover all possible queries and topics in open-domain search scenarios. The risk of fitting bias when training a clarifying question retrieval/generation model on incomprehensive dataset is thus another important challenge.

In this work, we innovatively explore generating clarifying questions in a zero-shot setting to overcome the cold start problem and we propose a constrained clarifying question generation system which uses both question templates and query facets to guide the effective and precise question generation. The experiment results show that our method outperforms existing state-of-the-art zero-shot baselines by a large margin. Human annotations to our model outputs also indicate our method generates 25.2% more natural questions, 18.1% more useful questions, 6.1% less unnatural and 4% less useless questions.

PROD: Progressive Distillation for Dense Retrieval

Knowledge distillation is an effective way to transfer knowledge from a strong teacher to an efficient student model. Ideally, we expect the better the teacher is, the better the student performs. However, this expectation does not always come true. It is common that a strong teacher model results in a bad student via distillation due to the nonnegligible gap between teacher and student. To bridge the gap, we propose PROD, a PROgressive Distillation method, for dense retrieval. PROD consists of a teacher progressive distillation and a data progressive distillation to gradually improve the student. To alleviate catastrophic forgetting, we introduce a regularization term in each distillation process. We conduct extensive experiments on seven datasets including five widely-used publicly available benchmarks: MS MARCO Passage, TREC Passage 19, TREC Document 19, MS MARCO Document, and Natural Questions, as well as two industry datasets: Bing-Rel and Bing-Ads. PROD achieves the state-of-the-art in the distillation methods for dense retrieval. Our 6-layer student model even surpasses most of the existing 12-layer models on all five public benchmarks. The code and models are released in https://github.com/microsoft/SimXNS.

FANS: Fast Non-Autoregressive Sequence Generation for Item List Continuation

User-curated item lists, such as video-based playlists on Youtube and book-based lists on Goodreads, have become prevalent for content sharing on online platforms. Item list continuation is proposed to model the overall trend of a list and predict subsequent items. Recently, Transformer-based models have shown promise in comprehending contextual information and capturing item relationships in a list. However, deploying them in real-time industrial applications is challenging, mainly because the autoregressive generation mechanism used in them is time-consuming. In this paper, we propose a novel fast non-autoregressive sequence generation model, namely FANS, to enhance inference efficiency and quality for item list continuation. First, we use a non-autoregressive generation mechanism to decode next K items simultaneously instead of one by one in existing models. Then, we design a two-stage classifier to replace the vanilla classifier used in current transformer-based models to further reduce the decoding time. Moreover, to improve the quality of non-autoregressive generation, we employ a curriculum learning strategy to optimize training. Experimental results on four real-world item list continuation datasets including Zhihu, Spotify, AotM, and Goodreads show that our FANS model can significantly improve inference efficiency (up to 8.7x) while achieving competitive or better generation quality for item list continuation compared with the state-of-the-art autoregressive models. We also validate the efficiency of FANS in an industrial setting. Our source code and data will be available at MindSpore/models1 and Github2.

DANCE: Learning A Domain Adaptive Framework for Deep Hashing

This paper studies unsupervised domain adaptive hashing, which aims to transfer a hashing model from a label-rich source domain to a label-scarce target domain. Current state-of-the-art approaches generally resolve the problem by integrating pseudo-labeling and domain adaptation techniques into deep hashing paradigms. Nevertheless, they usually suffer from serious class imbalance in pseudo-labels and suboptimal domain alignment caused by the neglection of the intrinsic structures of two domains. To address this issue, we propose a novel method named unbiaseD duAl hashiNg Contrastive lEarning (DANCE) for domain adaptive image retrieval. The core of our DANCE is to perform contrastive learning on hash codes from both instance level and prototype level. To begin, DANCE utilizes label information to guide instance-level hashing contrastive learning in the source domain. To generate unbiased and reliable pseudo-labels for semantic learning in the target domain, we uniformly select samples around each label embedding in the Hamming space. A momentum-update scheme is also utilized to smooth the optimization process. Additionally, we measure the semantic prototype representations in both source and target domains and incorporate them into a domain-aware prototype-level contrastive learning paradigm, which enhances domain alignment in the Hamming space while maximizing the model capacity. Experimental results on a number of well-known domain adaptive retrieval benchmarks validate the effectiveness of our proposed DANCE compared to a variety of competing baselines in different settings.

Geographic Information Retrieval Using Wikipedia Articles

Assigning semantically relevant, real-world locations to documents opens new possibilities to perform geographic information retrieval. We propose a novel approach to automatically determine the latitude-longitude coordinates of appropriate Wikipedia articles with high accuracy, leveraging both text and metadata in the corpus. By examining articles whose base-truth coordinates are known, we show that our method attains a substantial improvement over state of the art works. We subsequently demonstrate how our approach could yield two benefits: (1) detecting significant geolocation errors in Wikipedia; and (2) proposing approximated coordinates for hundreds of thousands of articles which are not traditionally considered to be locations (such as events, ideas or people), opening new possibilities for conceptual geographic retrievals over Wikipedia.

Everything Evolves in Personalized PageRank

Personalized PageRank, as a graphical model, has been proven as an effective solution in many applications such as web page search, recommendation, etc. However, in the real world, the setting of personalized PageRank is usually dynamic like the evolving World Wide Web. On the one hand, the outdated PageRank solution can be sub-optimal for ignoring the evolution pattern. On the other hand, solving the solution from the scratch at each timestamp causes costly computation complexity. Hence, in this paper, we aim to solve the Personalized PageRank effectively and efficiently in a fully dynamic setting, i.e., every component in the Personalized PageRank formula is dependent on time. To this end, we propose the EvePPR method that can track the exact personalized PageRank solution at each timestamp in the fully dynamic setting, and we theoretically and empirically prove the accuracy and time complexity of EvePPR. Moreover, we apply EvePPR to solve the dynamic knowledge graph alignment task, where a fully dynamic setting is necessary but complex. The experiments show that EvePPR outperforms the state-of-the-art baselines for similar nodes retrieval across graphs.

Differentiable Optimized Product Quantization and Beyond

Vector quantization techniques, such as Product Quantization (PQ), play a vital role in approximate nearest neighbor search (ANNs) and maximum inner product search (MIPS) owing to their remarkable search and storage efficiency. However, the indexes in vector quantization cannot be trained together with the inference models since data indexing is not differentiable. To this end, differentiable vector quantization approaches, such as DiffPQ and DeepPQ, have been recently proposed, but existing methods have two drawbacks. First, they do not impose any constraints on codebooks, such that the resultant codebooks lack diversity, leading to limited retrieval performance. Second, since data indexing resorts to operator, differentiability is usually achieved by either relaxation or Straight-Through Estimation (STE), which leads to biased gradient and slow convergence. To address these problems, we propose a Differentiable Optimized Product Quantization method (DOPQ) and beyond in this paper. Particularly, each data is projected into multiple orthogonal spaces, to generate multiple views of data. Thus, each codebook is learned with one view of data, guaranteeing the diversity of codebooks. Moreover, instead of simple differentiable relaxation, DOPQ optimizes the loss based on direct loss minimization, significantly reducing the gradient bias problem. Finally, DOPQ is evaluated with seven datasets of both recommendation and image search tasks. Extensive experimental results show that DOPQ outperforms state-of-the-art baselines by a large margin.

Incorporating Explicit Subtopics in Personalized Search

The key to personalized search is modeling user intents to tailor returned results for different users. Existing personalized methods mainly focus on learning implicit user interest vectors. In this paper, we propose ExpliPS, a personalized search model that explicitly incorporates query subtopics into personalization. It models the user’s current intent by estimating the user’s preference over the subtopics of the current query and personalizes the results over the weighted subtopics. We think that in such a way, personalized search could be more explainable and stable. Specifically, we first employ a semantic encoder to learn the representations of the user’s historical behaviours. Then with the historical behaviour representations, a subtopic preference encoder is devised to predict the user’s subtopic preferences on the current query. Finally, we rerank the candidates via a subtopic-aware ranker that prioritizes the documents relevant to the user-preferred subtopics. Experimental results show our model ExpliPS outperforms the state-of-the-art personalized web search models with explainable and stable results.

Optimizing Guided Traversal for Fast Learned Sparse Retrieval

Recent studies show that BM25-driven dynamic index skipping can greatly accelerate MaxScore-based document retrieval based on the learned sparse representation derived by DeepImpact. This paper investigates the effectiveness of such a traversal guidance strategy during top k retrieval when using other models such as SPLADE and uniCOIL, and finds that unconstrained BM25-driven skipping could have a visible relevance degradation when the BM25 model is not well aligned with a learned weight model or when retrieval depth k is small. This paper generalizes the previous work and optimizes the BM25 guided index traversal with a two-level pruning control scheme and model alignment for fast retrieval using a sparse representation. Although there can be a cost of increased latency, the proposed scheme is much faster than the original MaxScore method without BM25 guidance while retaining the relevance effectiveness. This paper analyzes the competitiveness of this two-level pruning scheme, and evaluates its tradeoff in ranking relevance and time efficiency when searching several test datasets.

Optimizing Feature Set for Click-Through Rate Prediction

Click-through prediction (CTR) models transform features into latent vectors and enumerate possible feature interactions to improve performance based on the input feature set. Therefore, when selecting an optimal feature set, we should consider the influence of both features and their interaction. However, most previous works focus on either feature field selection or only select feature interaction based on the fixed feature set to produce the feature set. The former restricts search space to the feature field, which is too coarse to determine subtle features. They also do not filter useless feature interactions, leading to higher computation costs and degraded model performance. The latter identifies useful feature interaction from all available features, resulting in many redundant features in the feature set. In this paper, we propose a novel method named OptFS to address these problems. To unify the selection of features and their interaction, we decompose the selection of each feature interaction into the selection of two correlated features. Such a decomposition makes the model end-to-end trainable given various feature interaction operations. By adopting feature-level search space, we set a learnable gate to determine whether each feature should be within the feature set. Because of the large-scale search space, we develop a learning-by-continuation training scheme to learn such gates. Hence, OptFS generates the feature set containing features that improve the final prediction results. Experimentally, we evaluate OptFS on three public datasets, demonstrating OptFS can optimize feature sets which enhance the model performance and further reduce both the storage and computational cost.

A Reference-Dependent Model for Web Search Evaluation: Understanding and Measuring the Experience of Boundedly Rational Users

Previous researches demonstrate that users’ actions in search interaction are associated with relative gains and losses to reference points, known as the reference dependence effect. However, this widely confirmed effect is not represented in most user models underpinning existing search evaluation metrics. In this study, we propose a new evaluation metric framework, namely Reference Dependent Metric (ReDeM), for assessing query-level search by incorporating the effect of reference dependence into the modelling of user search behavior. To test the overall effectiveness of the proposed framework, (1) we evaluate the performance, in terms of correlation with user satisfaction, of ReDeMs built upon different reference points against that of the widely-used metrics on three search datasets; (2) we examine the performance of ReDeMs under different task states, like task difficulty and task urgency; and (3) we analyze the statistical reliability of ReDeMs in terms of discriminative power. Experimental results indicate that: (1) ReDeMs integrated with a proper reference point achieve better correlations with user satisfaction than most of the existing metrics, like Discounted Cumulative Gain (DCG) and Rank-Biased Precision (RBP), even though their parameters have already been well-tuned; (2) ReDeMs reach relatively better performance compared to existing metrics when the task triggers a high-level cognitive load; (3) the discriminative power of ReDeMs is far stronger than Expected Reciprocal Rank (ERR), slightly stronger than Precision and similar to DCG, RBP and INST. To our knowledge, this study is the first to explicitly incorporate the reference dependence effect into the user browsing model and offline evaluation metrics. Our work illustrates a promising approach to leveraging the insights about user biases from cognitive psychology in better evaluating user search experience and enhancing user models.

Filtered-DiskANN: Graph Algorithms for Approximate Nearest Neighbor Search with Filters

As Approximate Nearest Neighbor Search (ANNS)-based dense retrieval becomes ubiquitous for search and recommendation scenarios, efficiently answering filtered ANNS queries has become a critical requirement. Filtered ANNS queries ask for the nearest neighbors of a query’s embedding from the points in the index that match the query’s labels such as date, price range, language. There has been little prior work on algorithms that use label metadata associated with vector data to build efficient indices for filtered ANNS queries. Consequently, current indices have high search latency or low recall which is not practical in interactive web-scenarios. We present two algorithms with native support for faster and more accurate filtered ANNS queries: one with streaming support, and another based on batch construction. Central to our algorithms is the construction of a graph-structured index which forms connections not only based on the geometry of the vector data, but also the associated label set. On real-world data with natural labels, both algorithms are an order of magnitude or more efficient for filtered queries than the current state of the art algorithms. The generated indices also be queried from an SSD and support thousands of queries per second at over recall@10.

SESSION: Economics, Monetization, and Online Markets

Ad Auction Design with Coupon-Dependent Conversion Rate in the Auto-bidding World

Online advertising has become a dominant source of revenue of the Internet. In classic auction theory, only the auctioneer (i.e., the platform) and buyers (i.e., the advertisers) are involved, while the advertising audiences are ignored. For ecommerce advertising, however, the platform can provide coupons for the advertising audiences and nudge them into purchasing more products at lower prices (e.g., 2 dollars off the regular price). Such promotions can lead to an increase in amount and value of purchases. In this paper, we jointly design the coupon value computation, slot allocation, and payment of online advertising in an auto-bidding world. Firstly, we propose the auction mechanism, named CFA-auction (i.e., Coupon-For-the-Audiences-auction), which takes advertising audiences into account in the auction design. We prove the existence of pacing equilibrium, and show that CFA-auction satisfies the IC (incentive compatibility), IR (individual rationality) constraints. Then, we study the optimality of CFA-auction, and prove it can maintain an approximation of the optimal. Finally, experimental evaluation results on both offline dataset as well as online A/B test demonstrate the effectiveness of CFA-auction.

Autobidding Auctions in the Presence of User Costs

We study autobidding ad auctions with user costs, where each bidder is value-maximizing subject to a return-over-investment (ROI) constraint, and the seller aims to maximize the social welfare taking into consideration the user’s cost of viewing an ad. We show that in the worst case, the approximation ratio of social welfare by running the vanilla VCG auctions with user costs could as bad as 0. To improve the performance of VCG, We propose a new variant of VCG based on properly chosen cost multipliers, and prove that there exist auction-dependent and bidder-dependent cost multipliers that guarantee approximation ratios of 1/2 and 1/4 respectively in terms of the social welfare.

Multitask Peer Prediction With Task-dependent Strategies

Peer prediction aims to incentivize truthful reports from agents whose reports cannot be assessed with any objective ground truthful information. In the multi-task setting where each agent is asked multiple questions, a sequence of mechanisms have been proposed which are truthful — truth-telling is guaranteed to be an equilibrium, or even better, informed truthful — truth-telling is guaranteed to be one of the best-paid equilibria. However, these guarantees assume agents’ strategies are restricted to be task-independent: an agent’s report on a task is not affected by her information about other tasks.

We provide the first discussion on how to design (informed) truthful mechanisms for task-dependent strategies, which allows the agents to report based on all her information on the assigned tasks. We call such stronger mechanisms (informed) omni-truthful. In particular, we propose the joint-disjoint task framework, a new paradigm which builds upon the previous penalty-bonus task framework. First, we show a natural reduction from mechanisms in the penalty-bonus task framework to mechanisms in the joint-disjoint task framework that maps every truthful mechanism to an omni-truthful mechanism. Such a reduction is non-trivial as we show that current penalty-bonus task mechanisms are not, in general, omni-truthful. Second, for a stronger truthful guarantee, we design the matching agreement (MA) mechanism which is informed omni-truthful. Finally, for the MA mechanism in the detail-free setting where no prior knowledge is assumed, we show how many tasks are required to (approximately) retain the truthful guarantees.

Stability and Efficiency of Personalised Cultural Markets

This work is concerned with the dynamics of online cultural markets, namely, attention allocation of many users on a set of digital goods with infinite supply. Such dynamics are important in shaping processes and outcomes in society, from trending items in entertainment, collective knowledge creation, to election outcomes. The outcomes of online cultural markets are susceptible to intricate social influence dynamics, particularly so when the community comprises consumers with heterogeneous interests. This has made formal analysis of these markets improbable. In this paper, we remedy this by establishing robust connections between influence dynamics and optimization processes, in trial-offer markets where the consumer preferences are modelled by multinomial logit. Among other results, we show that the proportional-response-esque influence dynamic is equivalent to stochastic mirror descent on a convex objective function, thus leading to a stable and predictable outcome. When all consumers are homogeneous, the objective function has a natural interpretation as a weighted sum of efficiency and diversity of the culture market. In simulations driven by real-world preferences collected from a large-scale recommender system, we observe that ranking strategies aligned with the underlying heterogeneous preferences are more stable, and achieves higher efficiency and diversity. More broadly, we see this work as the first step in connecting computational methods for classical markets to the problem area of online attention and recommender systems. We hope this result paves the way to posing and answering a diverse set of research questions in this area.

Learning with Exposure Constraints in Recommendation Systems

Recommendation systems are dynamic economic systems that balance the needs of multiple stakeholders. A recent line of work studies incentives from the content providers’ point of view. Content providers, e.g., vloggers and bloggers, contribute fresh content and rely on user engagement to create revenue and finance their operations. In this work, we propose a contextual multi-armed bandit setting to model the dependency of content providers on exposure. In our model, the system receives a user context in every round and has to select one of the arms. Every arm is a content provider who must receive a minimum number of pulls every fixed time period (e.g., a month) to remain viable in later rounds; otherwise, the arm departs and is no longer available. The system aims to maximize the users’ (content consumers) welfare. To that end, it should learn which arms are vital and ensure they remain viable by subsidizing arm pulls if needed. We develop algorithms with sub-linear regret, as well as a lower bound that demonstrates that our algorithms are optimal up to logarithmic factors.

High-Effort Crowds: Limited Liability via Tournaments

We consider the crowdsourcing setting where, in response to the assigned tasks, agents strategically decide both how much effort to exert (from a continuum) and whether to manipulate their reports. The goal is to design payment mechanisms that (1) satisfy limited liability (all payments are non-negative), (2) reduce the principal’s cost of budget, (3) incentivize effort and (4) incentivize truthful responses. In our framework, the payment mechanism composes a performance measurement, which noisily evaluates agents’ effort based on their reports, and a payment function, which converts the scores output by the performance measurement to payments.

Previous literature suggests applying a peer prediction mechanism combined with a linear payment function. This method can achieve either (1), (3) and (4), or (2), (3) and (4) in the binary effort setting. In this paper, we suggest using a rank-order payment function (tournament). Assuming Gaussian noise, we analytically optimize the rank-order payment function, and identify a sufficient statistic, sensitivity, which serves as a metric for optimizing the performance measurements. This helps us obtain (1), (2) and (3) simultaneously. Additionally, we show that adding noise to agents’ scores can preserve the truthfulness of the performance measurements under the non-linear tournament, which gives us all four objectives.

Our real-data estimated agent-based model experiments show that our method can greatly reduce the payment of effort elicitation while preserving the truthfulness of the performance measurement. In addition, we empirically evaluate several commonly used performance measurements in terms of their sensitivities and strategic robustness.

Auctions without commitment in the auto-bidding world

Advertisers in online ad auctions are increasingly using auto-bidding mechanisms to bid into auctions instead of directly bidding their value manually. One of the prominent auto-bidding formats is that of target cost-per-acquisition (tCPA) which maximizes the volume of conversions subject to a return-of-investment constraint. From an auction theoretic perspective however, this trend seems to go against foundational results that postulate that for profit-maximizing (aka quasi-linear) bidders, it is optimal to use a classic bidding system like marginal CPA (mCPA) bidding rather than using strategies like tCPA.

In this paper we rationalize the adoption of such seemingly sub-optimal bidding within the canonical quasi-linear framework. The crux of the argument lies in the notion of commitment. We consider a multi-stage game where first the auctioneer declares the auction rules; then bidders select either the tCPA or mCPA bidding format (and submit bids accordingly); and then, if the auctioneer lacks commitment, it can revisit the rules of the auction (e.g., may readjust reserve prices depending on the bids submitted by the bidders). Our main result is that so long as a bidder believes that the auctioneer lacks commitment to follow the rule of the declared auction then the bidder will make a higher profit by choosing the tCPA format compared to the classic mCPA format.

We then explore the commitment consequences for the auctioneer. In a simplified version of the model where there is only one bidder, we show that the tCPA subgame admits a credible equilibrium while the mCPA format does not. That is, when the bidder chooses the tCPA format the auctioneer can credibly implement the auction rules announced at the beginning of the game. We also show that, under some mild conditions, the auctioneer’s revenue is larger when the bidder uses the tCPA format rather than mCPA. We further quantify the value for the auctioneer to be able to commit to the declared auction rules.

Learning to Bid in Contextual First Price Auctions✱

In this work, we investigate the problem of how to bid in repeated contextual first price auctions for a single learner (the bidder). Concretely, at each time t, the learner receives a context and decides the bid based on historical information and xt. We assume that the maximum bid of all the others follows a linear model, i.e., mt = ⟨α0, xt⟩ + zt, where is unknown to the learner and zt is randomly sampled from a noise distribution with log-concave density. In this work, we consider both binary feedback (the learner can only observe whether she wins or not) and full information feedback (the learner can observe mt) models of the learner. For binary feedback, when the noise distribution is (partially) known, we propose a bidding algorithm that achieves at most regret, where Δ is a constant reserve in first price auctions.1 For the full information feedback with unknown noise distribution, we provide an algorithm that achieves regret at most . Our approach combines an estimator for log-concave density functions and the Maximum Likelihood Estimation (MLE) method to learn the noise distribution and linear weight α0 simultaneously. We complement our results with a lower bound result such that any bidding policy in a broad class must achieve regret at least , even when the learner receives the full information feedback and is known.

Online resource allocation in Markov Chains

A large body of work in Computer Science and Operations Research study online algorithms for stochastic resource allocation problems. The most common assumption is that the online requests have randomly generated i.i.d. types. This assumption is well justified for static markets and/or relatively short time periods. We consider dynamic markets, whose states evolve as a random walk in a market-specific Markov Chain. This is a new model that generalizes previous i.i.d. settings. We identify important parameters of the Markov chain that is crucial for obtaining good approximation guarantees to the expected value of the optimal offline algorithm which knows realizations of all requests in advance. We focus on a stylized single-resource setting and: (i) generalize the well-known Prophet Inequality from the optimal stopping theory (single-unit setting) to Markov Chain setting; (ii) in multi-unit setting, design a simple algorithm that is asymptotically optimal under mild assumptions on the underlying Markov chain.

Worst-Case Welfare of Item Pricing in the Tollbooth Problem

We study the worst-case welfare of item pricing in the tollbooth problem. The problem was first introduced by Guruswami et al. [27], and is a special case of the combinatorial auction in which (i) each of the m items in the auction is an edge of some underlying graph; and (ii) each of the n buyers is single-minded and only interested in buying all edges of a single path. We consider the competitive ratio between the hindsight optimal welfare and the optimal worst-case welfare among all item-pricing mechanisms, when the order of the arriving buyers is adversarial. We assume that buyers own the tie-breaking power, i.e. they can choose whether or not to buy the demand path at 0 utility. We prove a tight competitive ratio of 3/2 when the underlying graph is a single path (also known as the highway problem), whereas item-pricing can achieve the hindsight optimal if the seller is allowed to choose a proper tie-breaking rule to maximize the welfare [6, 11]. Moreover, we prove an O(1) upper bound of competitive ratio when the underlying graph is a tree.

For general graphs, we prove an Ω(m1/8) lower bound of the competitive ratio. We show that an mΩ(1) competitive ratio is unavoidable even if the graph is a grid, or if the capacity of every edge is augmented by a constant factor c. The results hold even if the seller has tie-breaking power.

Dynamic Interventions for Networked Contagions

We study the problem of designing dynamic intervention policies for minimizing cascading failures in online financial networks, as well we more general demand-supply networks. Formally, we consider a dynamic version of the celebrated Eisenberg-Noe model of financial network liabilities, and use this to study the design of external intervention policies. Our controller has a fixed resource budget in each round, and can use this to minimize the effect of demand/supply shocks in the network. We formulate the optimal intervention problem as a Markov Decision Process, and show how we can leverage the problem structure to efficiently compute optimal intervention policies with continuous interventions, and give approximation algorithms in the case of discrete interventions. Going beyond financial networks, we argue that our model captures dynamic network intervention in a much broader class of dynamic demand/supply settings with networked inter-dependencies. To demonstrate this, we apply our intervention algorithms to a wide variety of Web-related application domains, including ridesharing, online transaction platforms, and financial networks with agent mobility; in each case, we study the relationship between node centrality and intervention strength, as well as fairness properties of the optimal interventions.

Randomized Pricing with Deferred Acceptance for Revenue Maximization with Submodular Objectives

A lot of applications in web economics need to maximize the revenue under a budget for payments and also guarantee the truthfulness of users, so Budget-Feasible Mechanism (BFM) Design has aroused great interests during last decade. Most of the existing BFMs concentrate on maximizing a monotone submodular function subject to a knapsack constraint, which is insufficient for many applications with complex objectives or constraints. Observing this, the recent studies (e.g., [4, 5, 11]) have considered non-monotone submodular objectives or more complex constraints such as a k-system constraint. In this study, we follow this line of research and propose truthful BFMs with improved performance bounds for non-monotone submodular objectives with or without a k-system constraint. Our BFMs leverage the idea of providing random prices to users while deferring the decision on the final winning set, and are also based on a novel randomized algorithm for the canonical constrained submodular maximization problem achieving better performance bounds compared to the state-of-the-art. Finally, the effectiveness and efficiency of our approach are demonstrated by extensive experiments on several applications about social network marketing, crowdsourcing and personalized recommendation.

Eligibility Mechanisms: Auctions Meet Information Retrieval

The design of internet advertisement systems is both an auction design problem and an information retrieval (IR) problem. As an auction, the designer needs to take the participants incentives into account. As an information retrieval problem, it needs to identify the ad that it is the most relevant to a user out of an enormous set of ad candidates. Those aspects are combined by first having an IR system narrow down the initial set of ad candidates to a manageable size followed by an auction that ranks and prices those candidates.

If the IR system uses information about bids, agents could in principle manipulate the system by manipulating the IR stage even when the subsequent auction is truthful. In this paper we investigate the design of truthful IR mechanisms, which we term eligibility mechanisms. We model it as a truthful version of the stochastic probing problem. We show that there is a constant gap between the truthful and non-truthful versions of the stochastic probing problem and exhibit a constant approximation algorithm. En route, we also characterize the set of eligibility mechanisms, which provides necessary and sufficient conditions for an IR system to be truthful.

Online Bidding Algorithms for Return-on-Spend Constrained Advertisers✱

We study online auto-bidding algorithms for a single advertiser maximizing value under the Return-on-Spend (RoS) constraint, quantifying performance in terms of regret relative to the optimal offline solution that knows all queries a priori. We contribute a simple online algorithm that achieves near-optimal regret in expectation while always respecting the RoS constraint when the input queries are i.i.d. samples from some distribution. Integrating our results with  [9] achieves near-optimal regret under both RoS and fixed budget constraints. Our algorithm uses the primal-dual framework with online mirror descent (OMD) for the dual updates, and the analysis utilizes new insights into the gradient structure.

Efficiency of Non-Truthful Auctions in Auto-bidding: The Power of Randomization

Auto-bidding is now widely adopted as an interface between advertisers and internet advertising as it allows advertisers to specify high-level goals, such as maximizing value subject to a value-per-spend constraint. Prior research has mainly focused on auctions that are truthful (such as a second-price auction) because these auctions admit simple (uniform) bidding strategies and are thus simpler to analyze. The main contribution of this paper is to characterize the efficiency across the spectrum of all auctions, including non-truthful auctions for which optimal bidding may be complex.

For deterministic auctions, we show a dominance result: any uniform bidding equilibrium of a second-price auction (SPA) can be mapped to an equilibrium of any other auction – for example, first price auction (FPA) – with identical outcomes. In this sense, SPA with uniform bidding is an instance-wise optimal deterministic auction. Consequently, the price of anarchy (PoA) of any deterministic auction is at least the PoA of SPA with uniform bidding, which is known to be 2. We complement this by showing that the PoA of FPA without uniform bidding is 2.

Next, we show, surprisingly, that truthful pricing is not dominant in the randomized setting. There is a randomized version of FPA that achieves a strictly smaller price of anarchy than its truthful counterpart when there are two bidders per query. Furthermore, this randomized FPA achieves the best-known PoA for two bidders, thus showing the power of non-truthfulness when combined with randomization. Finally, we show that no prior-free auction (even randomized, non-truthful) can improve on a PoA bound of 2 when there are a large number of advertisers per auction. These results should be interpreted qualitatively as follows. When the auction pressure is low, randomization and non-truthfulness is beneficial. On the other hand, if the auction pressure is intense, the benefits diminishes and it is optimal to implement a second-price auction.

Fairness-aware Guaranteed Display Advertising Allocation under Traffic Cost Constraint

Real-time Bidding (RTB) and Guaranteed Display (GD) advertising are two primary ways to sell impressions for publishers in online display advertising. Although GD contract serves less efficiently compared to RTB ads, it helps advertisers reach numerous target audiences at a lower cost and allows publishers to increase overall advertising revenue. However, with billion-scale requests online per day, it’s a challenging problem for publishers to decide whether and which GD ad to display for each impression. In this paper, we propose an optimal allocation model for GD contracts considering optimizing three objectives: maximizing guaranteed delivery and impressions’ quality and minimizing the extra traffic cost of GD contracts to increase overall revenue. The traffic cost of GD contracts is defined as the potential expected revenue if the impression is allocated to RTB ads. Our model dynamically adjusts the weights for each GD contract between impressions’ quality and traffic cost based on real-time performance, which produces fairness-aware allocation results. A parallel training framework based on Parameter-Server (PS) architecture is utilized to efficiently and periodically update the model. Deriving from the allocation model, we also propose a simple and adaptive online bidding strategy for GD contracts, which can be updated quickly by feedback-based algorithms to achieve optimal impression allocation even in complex and dynamic environments. We demonstrate the effectiveness of our proposed method by using both offline evaluation and online A/B testing.

Is your digital neighbor a reliable investment advisor?

The web and social media platforms have drastically changed how investors produce and consume financial advice. Historically, individual investors were often relying on newsletters and related prospectus backed by the reputation and track record of their issuers. Nowadays, financial advice is frequently offered online, by anonymous or pseudonymous parties with little at stake. As such, a natural question is to investigate whether these modern financial “influencers” operate in good faith, or whether they might be misleading their followers intentionally. To start answering this question, we obtained data from a very large cryptocurrency derivatives exchange, from which we derived individual trading positions. Some of the investors on that platform elect to link to their Twitter profiles. We were thus able to compare the positions publicly espoused on Twitter with those actually taken in the market. We discovered that 1) staunchly “bullish” investors on Twitter often took much more moderate, if not outright opposite, positions in their own trades when the market was down, 2) their followers tended to align their positions with bullish Twitter outlooks, and 3) moderate voices on Twitter (and their own followers) were on the other hand far more consistent with their actual investment strategies. In other words, while social media advice may attempt to foster a sense of camaraderie among people of like-minded beliefs, the reality is that this is merely an illusion, which may result in financial losses for people blindly following advice.

Platform Behavior under Market Shocks: A Simulation Framework and Reinforcement-Learning Based Study

We study the behavior of an economic platform (e.g., Amazon, Uber Eats, Instacart) under shocks, such as COVID-19 lockdowns, and the effect of different regulation considerations. To this end, we develop a multi-agent simulation environment of a platform economy in a multi-period setting where shocks may occur and disrupt the economy. Buyers and sellers are heterogeneous and modeled as economically-motivated agents, choosing whether or not to pay fees to access the platform. We use deep reinforcement learning to model the fee-setting and matching behavior of the platform, and consider two major types of regulation frameworks: (1) taxation policies and (2) platform fee restrictions. We offer a number of simulated experiments that cover different market settings and shed light on regulatory tradeoffs. Our results show that while many interventions are ineffective with a sophisticated platform actor, we identify a particular kind of regulation—fixing fees to the optimal, no-shock fees while still allowing a platform to choose how to match buyers and sellers—as holding promise for promoting the efficiency and resilience of the economic system.

Near-Optimal Experimental Design Under the Budget Constraint in Online Platforms

A/B testing, or controlled experiments, is the gold standard approach to causally compare the performance of algorithms on online platforms. However, conventional Bernoulli randomization in A/B testing faces many challenges such as spillover and carryover effects. Our study focuses on another challenge, especially for A/B testing on two-sided platforms – budget constraints. Buyers on two-sided platforms often have limited budgets, where the conventional A/B testing may be infeasible to be applied, partly because two variants of allocation algorithms may conflict and lead some buyers to exceed their budgets if they are implemented simultaneously. We develop a model to describe two-sided platforms where buyers have limited budgets. We then provide an optimal experimental design that guarantees small bias and minimum variance. Bias is lower when there is more budget and a higher supply-demand rate. We test our experimental design on both synthetic data and real-world data, which verifies the theoretical results and shows our advantage compared to Bernoulli randomization.

Impartial Selection with Prior Information

We study the problem of impartial selection, a topic that lies at the intersection of computational social choice and mechanism design. The goal is to select the most popular individual among a set of community members. The input can be modeled as a directed graph, where each node represents an individual, and a directed edge indicates nomination or approval of a community member to another. An impartial mechanism is robust to potential selfish behavior of the individuals and provides appropriate incentives to voters to report their true preferences by ensuring that the chance of a node to become a winner does not depend on its outgoing edges. The goal is to design impartial mechanisms that select a node with an in-degree that is as close as possible to the highest in-degree. We measure the efficiency of such a mechanism by the difference of these in-degrees, known as its additive approximation.

Following the success in the design of auction and posted pricing mechanisms with good approximation guarantees for welfare and profit maximization, we study the extent to which prior information on voters’ preferences could be useful in the design of efficient deterministic impartial selection mechanisms with good additive approximation guarantees. We consider three models of prior information, which we call the opinion poll, the a priori popularity, and the uniform model. We analyze the performance of a natural selection mechanism that we call approval voting with default (AVD) and show that it achieves a additive guarantee for opinion poll and a for a priori popularity inputs, where n is the number of individuals. We consider this polylogarithmic bound as our main technical contribution. We complement this last result by showing that our analysis is close to tight, showing an Ω(ln n) lower bound. This holds in the uniform model, which is the simplest among the three models.

SESSION: Fairness, Accountability, Transparency and Ethics on the Web

Maximizing Submodular Functions for Recommendation in the Presence of Biases

Subset selection tasks, arise in recommendation systems and search engines and ask to select a subset of items that maximize the value for the user. The values of subsets often display diminishing returns, and hence, submodular functions have been used to model them. If the inputs defining the submodular function are known, then existing algorithms can be used. In many applications, however, inputs have been observed to have social biases that reduce the utility of the output subset. Hence, interventions to improve the utility are desired. Prior works focus on maximizing linear functions—a special case of submodular functions—and show that fairness constraint-based interventions can not only ensure proportional representation but also achieve near-optimal utility in the presence of biases. We study the maximization of a family of submodular functions that capture functions arising in the aforementioned applications. Our first result is that, unlike linear functions, constraint-based interventions cannot guarantee any constant fraction of the optimal utility for this family of submodular functions. Our second result is an algorithm for submodular maximization. The algorithm provably outputs subsets that have near-optimal utility for this family under mild assumptions and that proportionally represent items from each group. In empirical evaluation, with both synthetic and real-world data, we observe that this algorithm improves the utility of the output subset for this family of submodular functions over baselines.

Do Language Models Plagiarize?

Past literature has illustrated that language models (LMs) often memorize parts of training instances and reproduce them in natural language generation (NLG) processes. However, it is unclear to what extent LMs “reuse” a training corpus. For instance, models can generate paraphrased sentences that are contextually similar to training samples. In this work, therefore, we study three types of plagiarism (i.e., verbatim, paraphrase, and idea) among GPT-2 generated texts, in comparison to its training data, and further analyze the plagiarism patterns of fine-tuned LMs with domain-specific corpora which are extensively used in practice. Our results suggest that (1) three types of plagiarism widely exist in LMs beyond memorization, (2) both size and decoding methods of LMs are strongly associated with the degrees of plagiarism they exhibit, and (3) fine-tuned LMs’ plagiarism patterns vary based on their corpus similarity and homogeneity. Given that a majority of LMs’ training data is scraped from the Web without informing content owners, their reiteration of words, phrases, and even core ideas from training sets into generated texts has ethical implications. Their patterns are likely to exacerbate as both the size of LMs and their training data increase, raising concerns about indiscriminately pursuing larger models with larger training corpora. Plagiarized content can also contain individuals’ personal and sensitive information. These findings overall cast doubt on the practicality of current LMs in mission-critical writing tasks and urge more discussions around the observed phenomena. Data and source code are available at https://github.com/Brit7777/LM-plagiarism.

Scoping Fairness Objectives and Identifying Fairness Metrics for Recommender Systems: The Practitioners’ Perspective

Measuring and assessing the impact and “fairness’’ of recommendation algorithms is central to responsible recommendation efforts. However, the complexity of fairness definitions and the proliferation of fairness metrics in research literature have led to a complex decision-making space. This environment makes it challenging for practitioners to operationalize and pick metrics that work within their unique context. This suggests that practitioners require more decision-making support, but it is not clear what type of support would be beneficial. We conducted a literature review of 24 papers to gather metrics introduced by the research community for measuring fairness in recommendation and ranking systems. We organized these metrics into a ‘decision-tree style’ support framework designed to help practitioners scope fairness objectives and identify fairness metrics relevant to their recommendation domain and application context. To explore the feasibility of this approach, we conducted 15 semi-structured interviews using this framework to assess which challenges practitioners may face when scoping fairness objectives and metrics for their system, and which further support may be needed beyond such tools.

Simplistic Collection and Labeling Practices Limit the Utility of Benchmark Datasets for Twitter Bot Detection

Accurate bot detection is necessary for the safety and integrity of online platforms. It is also crucial for research on the influence of bots in elections, the spread of misinformation, and financial market manipulation. Platforms deploy infrastructure to flag or remove automated accounts, but their tools and data are not publicly available. Thus, the public must rely on third-party bot detection. These tools employ machine learning and often achieve near-perfect performance for classification on existing datasets, suggesting bot detection is accurate, reliable and fit for use in downstream applications. We provide evidence that this is not the case and show that high performance is attributable to limitations in dataset collection and labeling rather than sophistication of the tools. Specifically, we show that simple decision rules — shallow decision trees trained on a small number of features — achieve near-state-of-the-art performance on most available datasets and that bot detection datasets, even when combined together, do not generalize well to out-of-sample datasets. Our findings reveal that predictions are highly dependent on each dataset’s collection and labeling procedures rather than fundamental differences between bots and humans. These results have important implications for both transparency in sampling and labeling procedures and potential biases in research using existing bot detection tools for pre-processing.

A Method to Assess and Explain Disparate Impact in Online Retailing

This paper presents a method for assessing whether algorithmic decision making induces disparate impact in online retailing. The proposed method specifies a statistical design, a sampling algorithm, and a technological setup for data collection through web crawling. The statistical design reduces the dimensionality of the problem and ensures that the data collected are representative, variation-rich, and suitable for the investigation of the causes behind any observed disparities. Implementations of the method can collect data on algorithmic decisions, such as price, recommendations, and delivery fees that can be matched to website visitor demographic data from established sources such as censuses and large scale surveys. The combined data can be used to investigate the presence and causes of disparate impact, potentially helping online retailers audit their algorithms without collecting or holding the demographic data of their users. The proposed method is illustrated in the context of the automated pricing decisions of a leading retailer in the United States. A custom-built platform implemented the method to collect data for nearly 20,000 different grocery products at more than 3,000 randomly-selected zip codes. The data collected indicates that prices are higher for locations with high proportions of minority households. Although these price disparities can be partly attributed to algorithmic biases, they are mainly explained by local factors and therefore can be regarded as business necessities.

Path-specific Causal Fair Prediction via Auxiliary Graph Structure Learning

With ubiquitous adoption of machine learning algorithms in web technologies, such as recommendation system and social network, algorithm fairness has become a trending topic, and it has a great impact on social welfare. Among different fairness definitions, path-specific causal fairness is a widely adopted one with great potentials, as it distinguishes the fair and unfair effects that the sensitive attributes exert on algorithm predictions. Existing methods based on path-specific causal fairness either require graph structure as the prior knowledge or have high complexity in the calculation of path-specific effect. To tackle these challenges, we propose a novel casual graph based fair prediction framework which integrates graph structure learning into fair prediction to ensure that unfair pathways are excluded in the causal graph. Furthermore, we generalize the proposed framework to the scenarios where sensitive attributes can be non-root nodes and affected by other variables, which is commonly observed in real-world applications, such as recommendation system, but hardly addressed by existing works. We provide theoretical analysis on the generalization bound for the proposed fair prediction method, and conduct a series of experiments on real-world datasets to demonstrate that the proposed framework can provide better prediction performance and algorithm fairness trade-off.

Same Same, But Different: Conditional Multi-Task Learning for Demographic-Specific Toxicity Detection

Algorithmic bias often arises as a result of differential subgroup validity, in which predictive relationships vary across groups. For example, in toxic language detection, comments targeting different demographic groups can vary markedly across groups. In such settings, trained models can be dominated by the relationships that best fit the majority group, leading to disparate performance. We propose framing toxicity detection as multi-task learning (MTL), allowing a model to specialize on the relationships that are relevant to each demographic group while also leveraging shared properties across groups. With toxicity detection, each task corresponds to identifying toxicity against a particular demographic group. However, traditional MTL requires labels for all tasks to be present for every data point. To address this, we propose Conditional MTL (CondMTL), wherein only training examples relevant to the given demographic group are considered by the loss function. This lets us learn group specific representations in each branch which are not cross contaminated by irrelevant labels. Results on synthetic and real data show that using CondMTL improves predictive recall over various baselines in general and for the minority demographic group in particular, while having similar overall accuracy.

P-MMF: Provider Max-min Fairness Re-ranking in Recommender System

In this paper, we address the issue of recommending fairly from the aspect of providers, which has become increasingly essential in multistakeholder recommender systems. Existing studies on provider fairness usually focused on designing proportion fairness (PF) metrics that first consider systematic fairness. However, sociological researches show that to make the market more stable, max-min fairness (MMF) is a better metric. The main reason is that MMF aims to improve the utility of the worst ones preferentially, guiding the system to support the providers in weak market positions. When applying MMF to recommender systems, how to balance user preferences and provider fairness in an online recommendation scenario is still a challenging problem. In this paper, we proposed an online re-ranking model named Provider Max-min Fairness Re-ranking (P-MMF) to tackle the problem. Specifically, P-MMF formulates provider fair recommendation as a resource allocation problem, where the exposure slots are considered the resources to be allocated to providers and the max-min fairness is used as the regularizer during the process. We show that the problem can be further represented as a regularized online optimizing problem and solved efficiently in its dual space. During the online re-ranking phase, a momentum gradient descent method is designed to conduct the dynamic re-ranking. Theoretical analysis showed that the regret of P-MMF can be bounded. Experimental results on four public recommender datasets demonstrated that P-MMF can outperformed the state-of-the-art baselines. Experimental results also show that P-MMF can retain small computationally costs on a corpus with the large number of items.

Towards Explainable Collaborative Filtering with Taste Clusters Learning

Collaborative Filtering (CF) is a widely used and effective technique for recommender systems. In recent decades, there have been significant advancements in latent embedding-based CF methods for improved accuracy, such as matrix factorization, neural collaborative filtering, and LightGCN. However, the explainability of these models has not been fully explored. Adding explainability to recommendation models can not only increase trust in the decision-making process, but also have multiple benefits such as providing persuasive explanations for item recommendations, creating explicit profiles for users and items, and assisting item producers in design improvements.

In this paper, we propose a neat and effective Explainable Collaborative Filtering (ECF) model that leverages interpretable cluster learning to achieve the two most demanding objectives: (1) Precise - the model should not compromise accuracy in the pursuit of explainability; and (2) Self-explainable - the model’s explanations should truly reflect its decision-making process, not generated from post-hoc methods. The core of ECF is mining taste clusters from user-item interactions and item profiles. We map each user and item to a sparse set of taste clusters, and taste clusters are distinguished by a few representative tags. The user-item preference, users/items’ cluster affiliations, and the generation of taste clusters are jointly optimized in an end-to-end manner. Additionally, we introduce a forest mechanism to ensure the model’s accuracy, explainability, and diversity. To comprehensively evaluate the explainability quality of taste clusters, we design several quantitative metrics, including in-cluster item coverage, tag utilization, silhouette, and informativeness. Our model’s effectiveness is demonstrated through extensive experiments on three real-world datasets.

Fairly Adaptive Negative Sampling for Recommendations

Pairwise learning strategies are prevalent for optimizing recommendation models on implicit feedback data, which usually learns user preference by discriminating between positive (i.e., clicked by a user) and negative items (i.e., obtained by negative sampling). However, the size of different item groups (specified by item attribute) is usually unevenly distributed. We empirically find that the commonly used uniform negative sampling strategy for pairwise algorithms (e.g., BPR) can inherit such data bias and oversample the majority item group as negative instances, severely countering group fairness on the item side. In this paper, we propose a Fairly adaptive Negative sampling approach (FairNeg), which improves item group fairness via adaptively adjusting the group-level negative sampling distribution in the training process. In particular, it first perceives the model’s unfairness status at each step and then adjusts the group-wise sampling distribution with an adaptive momentum update strategy for better facilitating fairness optimization. Moreover, a negative sampling distribution Mixup mechanism is proposed, which gracefully incorporates existing importance-aware sampling techniques intended for mining informative negative samples, thus allowing for achieving multiple optimization purposes. Extensive experiments on four public datasets show our proposed method’s superiority in group fairness enhancement and fairness-utility tradeoff.

HateProof: Are Hateful Meme Detection Systems really Robust?

Exploiting social media to spread hate has tremendously increased over the years. Lately, multi-modal hateful content such as memes has drawn relatively more traction than uni-modal content. Moreover, the availability of implicit content payloads makes them fairly challenging to be detected by existing hateful meme detection systems. In this paper, we present a use case study to analyze such systems’ vulnerabilities against external adversarial attacks. We find that even very simple perturbations in uni-modal and multi-modal settings performed by humans with little knowledge about the model can make the existing detection models highly vulnerable. Empirically, we find a noticeable performance drop of as high as 10% in the macro-F1 score for certain attacks. As a remedy, we attempt to boost the model’s robustness using contrastive learning as well as an adversarial training-based method - VILLA. Using an ensemble of the above two approaches, in two of our high resolution datasets, we are able to (re)gain back the performance to a large extent for certain attacks. We believe that ours is a first step toward addressing this crucial problem in an adversarial setting and would inspire more such investigations in the future.

Towards Fair Allocation in Social Commerce Platforms

Social commerce platforms are emerging businesses where producers sell products through re-sellers who advertise the products to other customers in their social network. Due to the increasing popularity of this business model, thousands of small producers and re-sellers are starting to depend on these platforms for their livelihood; thus, it is important to provide fair earning opportunities to them. The enormous product space in such platforms prohibits manual search, and motivates the need for recommendation algorithms to effectively allocate product exposure and, consequently, earning opportunities. In this work, we focus on the fairness of such allocations in social commerce platforms and formulate the problem of assigning products to re-sellers as a fair division problem with indivisible items under two-sided cardinality constraints, wherein each product must be given to at least a certain number of re-sellers and each re-seller must get a certain number of products.

Our work systematically explores various well-studied benchmarks of fairness—including Nash social welfare, envy-freeness up to one item (EF1), and equitability up to one item (EQ1)—from both theoretical and experimental perspectives. We find that the existential and computational guarantees of these concepts known from the unconstrained setting do not extend to our constrained model. To address this limitation, we develop a mixed-integer linear program and other scalable heuristics that provide near-optimal approximation of Nash social welfare in simulated and real social commerce datasets. Overall, our work takes the first step towards achieving provable fairness alongside reasonable revenue guarantees on social commerce platforms.

Fairness-Aware Clique-Preserving Spectral Clustering of Temporal Graphs

With the widespread development of algorithmic fairness, there has been a surge of research interest that aims to generalize the fairness notions from the attributed data to the relational data (graphs). The vast majority of existing work considers the fairness measure in terms of the low-order connectivity patterns (e.g., edges), while overlooking the higher-order patterns (e.g., k-cliques) and the dynamic nature of real-world graphs. For example, preserving triangles from graph cuts during clustering is the key to detecting compact communities; however, if the clustering algorithm only pays attention to triangle-based compactness, then the returned communities lose the fairness guarantee for each group in the graph. Furthermore, in practice, when the graph (e.g., social networks) topology constantly changes over time, one natural question is how can we ensure the compactness and demographic parity at each timestamp efficiently. To address these problems, we start from the static setting and propose a spectral method that preserves clique connections and incorporates demographic fairness constraints in returned clusters at the same time. To make this static method fit for the dynamic setting, we propose two core techniques, Laplacian Update via Edge Filtering and Searching and Eigen-Pairs Update with Singularity Avoided. Finally, all proposed components are combined into an end-to-end clustering framework named F-SEGA, and we conduct extensive experiments to demonstrate the effectiveness, efficiency, and robustness of F-SEGA.

DualFair: Fair Representation Learning at Both Group and Individual Levels via Contrastive Self-supervision

Algorithmic fairness has become an important machine learning problem, especially for mission-critical Web applications. This work presents a self-supervised model, called DualFair, that can debias sensitive attributes like gender and race from learned representations. Unlike existing models that target a single type of fairness, our model jointly optimizes for two fairness criteria—group fairness and counterfactual fairness—and hence makes fairer predictions at both the group and individual levels. Our model uses contrastive loss to generate embeddings that are indistinguishable for each protected group, while forcing the embeddings of counterfactual pairs to be similar. It then uses a self-knowledge distillation method to maintain the quality of representation for the downstream tasks. Extensive analysis over multiple datasets confirms the model’s validity and further shows the synergy of jointly addressing two fairness criteria, suggesting the model’s potential value in fair intelligent Web applications.

Fairness in model-sharing games

In many real-world situations, data is distributed across multiple self-interested agents. These agents can collaborate to build a machine learning model based on data from multiple agents, potentially reducing the error each experiences. However, sharing models in this way raises questions of fairness: to what extent can the error experienced by one agent be significantly lower than the error experienced by another agent in the same coalition? In this work, we consider two notions of fairness that each may be appropriate in different circumstances: egalitarian fairness (which aims to bound how dissimilar error rates can be) and proportional fairness (which aims to reward players for contributing more data). We similarly consider two common methods of model aggregation, one where a single model is created for all agents (uniform), and one where an individualized model is created for each agent. For egalitarian fairness, we obtain a tight multiplicative bound on how widely error rates can diverge between agents collaborating (which holds for both aggregation methods). For proportional fairness, we show that the individualized aggregation method always gives a small player error that is upper bounded by proportionality. For uniform aggregation, we show that this upper bound is guaranteed for any individually rational coalition (where no player wishes to leave to do local learning).

PaGE-Link: Path-based Graph Neural Network Explanation for Heterogeneous Link Prediction

Transparency and accountability have become major concerns for black-box machine learning (ML) models. Proper explanations for the model behavior increase model transparency and help researchers develop more accountable models. Graph neural networks (GNN) have recently shown superior performance in many graph ML problems than traditional methods, and explaining them has attracted increased interest. However, GNN explanation for link prediction (LP) is lacking in the literature. LP is an essential GNN task and corresponds to web applications like recommendation and sponsored search on web. Given existing GNN explanation methods only address node/graph-level tasks, we propose Path-based GNN Explanation for heterogeneous Link prediction (PaGE-Link) that generates explanations with connection interpretability, enjoys model scalability, and handles graph heterogeneity. Qualitatively, PaGE-Link can generate explanations as paths connecting a node pair, which naturally captures connections between the two nodes and easily transfer to human-interpretable explanations. Quantitatively, explanations generated by PaGE-Link improve AUC for recommendation on citation and user-item graphs by 9 - 35% and are chosen as better by 78.79% of responses in human evaluation.

SESSION: Crowdsourcing and Human Computation

Combining Worker Factors for Heterogeneous Crowd Task Assignment

Optimising the assignment of tasks to workers is an effective approach to ensure high quality in crowdsourced data - particularly in heterogeneous micro tasks. However, previous attempts at heterogeneous micro task assignment based on worker characteristics are limited to using cognitive skills, despite literature emphasising that worker performance varies based on other parameters. This study is an initial step towards understanding whether and how multiple parameters such as cognitive skills, mood, personality, alertness, comprehension skill, and social and physical context of workers can be leveraged in tandem to improve worker performance estimations in heterogeneous micro tasks. Our predictive models indicate that these parameters have varying effects on worker performance in the five task types considered – sentiment analysis, classification, transcription, named entity recognition and bounding box. Moreover, we note 0.003 - 0.018 reduction in mean absolute error of predicted worker accuracy across all tasks, when task assignment is based on models that consider all parameters vs. models that only consider workers’ cognitive skills. Our findings pave the way for the use of holistic approaches in micro task assignment that effectively quantify worker context.

Hidden Indicators of Collective Intelligence in Crowdfunding

Extensive literature argues that crowds possess essential collective intelligence benefits that allow superior decision-making by untrained individuals working in low-information environments. Classic wisdom of crowds theory is based on evidence gathered from studying large groups of diverse and independent decision-makers. Yet, most human decisions are reached in online settings of interconnected like-minded people that challenge these criteria. This observation raises a key question: Are there surprising expressions of collective intelligence online? Here, we explore whether crowds furnish collective intelligence benefits in crowdfunding systems. Crowdfunding has grown and diversified quickly over the past decade, expanding from funding aspirant creative works and supplying pro-social donations to enabling large citizen-funded urban projects and providing commercial interest-based unsecured loans. Using nearly 10 million loan contributions from a market-dominant lending platform, we find evidence for collective intelligence indicators in crowdfunding. Our results, which are based on a two-stage Heckman selection model, indicate that opinion diversity and the speed at which funds are contributed predict who gets funded and who repays, even after accounting for traditional measures of creditworthiness. Moreover, crowds work consistently well in correctly assessing the outcome of high-risk projects. Finally, diversity and speed serve as early warning signals when inferring fundraising based solely on the initial part of the campaign. Our findings broaden the field of crowd-aware system design and inform discussions about the augmentation of traditional financing systems with tech innovations.

A Dataset on Malicious Paper Bidding in Peer Review

In conference peer review, reviewers are often asked to provide “bids” on each submitted paper that express their interest in reviewing that paper. A paper assignment algorithm then uses these bids (along with other data) to compute a high-quality assignment of reviewers to papers. However, this process has been exploited by malicious reviewers who strategically bid in order to unethically manipulate the paper assignment, crucially undermining the peer review process. For example, these reviewers may aim to get assigned to a friend’s paper as part of a quid-pro-quo deal. A critical impediment towards creating and evaluating methods to mitigate this issue is the lack of any publicly-available data on malicious paper bidding. In this work, we collect and publicly release a novel dataset to fill this gap, collected from a mock conference activity where participants were instructed to bid either honestly or maliciously. We further provide a descriptive analysis of the bidding behavior, including our categorization of different strategies employed by participants. Finally, we evaluate the ability of each strategy to manipulate the assignment, and also evaluate the performance of some simple algorithms meant to detect malicious bidding. The performance of these detection algorithms can be taken as a baseline for future research on detecting malicious bidding.

Multiview Representation Learning from Crowdsourced Triplet Comparisons

Crowdsourcing has been used to collect data at scale in numerous fields. Triplet similarity comparison is a type of crowdsourcing task, in which crowd workers are asked the question “among three given objects, which two are more similar?”, which is relatively easy for humans to answer. However, the comparison can be sometimes based on multiple views, i.e., different independent attributes such as color and shape. Each view may lead to different results for the same three objects. Although an algorithm was proposed in prior work to produce multiview embeddings, it involves at least two problems: (1) the existing algorithm cannot independently predict multiview embeddings for a new sample, and (2) different people may prefer different views. In this study, we propose an end-to-end inductive deep learning framework to solve the multiview representation learning problem. The results show that our proposed method can obtain multiview embeddings of any object, in which each view corresponds to an independent attribute of the object. We collected two datasets from a crowdsourcing platform to experimentally investigate the performance of our proposed approach compared to conventional baseline methods.

HybridEval: A Human-AI Collaborative Approach for Evaluating Design Ideas at Scale

Evaluating design ideas is necessary to predict their success and assess their impact early on in the process. Existing methods rely either on metrics computed by systems that are effective but subject to errors and bias, or experts’ ratings, which are accurate but expensive and long to collect. Crowdsourcing offers a compelling way to evaluate a large number of design ideas in a short amount of time while being cost-effective. Workers’ evaluation is, however, less reliable and might substantially differ from experts’ evaluation.

In this work, we investigate workers’ rating behavior and compare it with experts. First, we instrument a crowdsourcing study where we asked workers to evaluate design ideas from three innovation challenges. We show that workers share similar insights with experts but tend to rate more generously and weigh certain criteria more importantly. Next, we develop a hybrid human-AI approach that combines a machine learning model with crowdsourcing to evaluate ideas. Our approach models workers’ reliability and bias while leveraging ideas’ textual content to train a machine learning model. It is able to incorporate experts’ ratings whenever available, to supervise the model training and infer worker performance. Results show that our framework outperforms baseline methods and requires significantly less training data from experts, thus providing a viable solution for evaluating ideas at scale.

Sedition Hunters: A Quantitative Study of the Crowdsourced Investigation into the 2021 U.S. Capitol Attack

Social media platforms have enabled extremists to organize violent events, such as the 2021 U.S. Capitol Attack. Simultaneously, these platforms enable professional investigators and amateur sleuths to collaboratively collect and identify imagery of suspects with the goal of holding them accountable for their actions. Through a case study of Sedition Hunters, a Twitter community whose goal is to identify individuals who participated in the 2021 U.S. Capitol Attack, we explore what are the main topics or targets of the community, who participates in the community, and how. Using topic modeling, we find that information sharing is the main focus of the community. We also note an increase in awareness of privacy concerns. Furthermore, using social network analysis, we show how some participants played important roles in the community. Finally, we discuss implications for the content and structure of online crowdsourced investigations.

Human-in-the-loop Regular Expression Extraction for Single Column Format Inconsistency

Format inconsistency is one of the most frequently appearing data quality issues encountered during data cleaning. Existing automated approaches commonly lack applicability and generalisability, while approaches with human inputs typically require specialized skills such as writing regular expressions. This paper proposes a novel hybrid human-machine system, namely “Data-Scanner-4C”, which leverages crowdsourcing to address syntactic format inconsistencies in a single column effectively. We first ask crowd workers to create examples from single-column data through “data selection” and “result validation” tasks. Then, we propose and use a novel rule-based learning algorithm to infer the regular expressions that propagate formats from created examples to the entire column. Our system integrates crowdsourcing and algorithmic format extraction techniques in a single workflow. Having human experts write regular expressions is no longer required, thereby reducing both the time as well as the opportunity for error. We conducted experiments through both synthetic and real-world datasets, and our results show how the proposed approach is applicable and effective across data types and formats.

SESSION: The Creative Web

Identifying Creative Harmful Memes via Prompt based Approach

The creative nature of memes has made it possible for harmful content to spread quickly and widely on the internet. Harmful memes can range from spreading hate speech promoting violence, and causing emotional distress to individuals or communities. These memes are often designed to be misleading, manipulative, and controversial, making it challenging to detect and remove them from online platforms. Previous studies focused on how to fuse visual and language modalities to capture contextual information. However, meme analysis still severely suffers from data deficiency, resulting in insufficient learning of fusion modules. Further, using conventional pretrained encoders for text and images exhibits a greater semantic gap in feature spaces and leads to low performance. To address these gaps, this paper reformulates a harmful meme analysis as an auto-filling and presents a prompt-based approach to identify harmful memes. Specifically, we first transform multimodal data to a single (i.e., textual) modality by generating the captions and attributes of the visual data and then prepend the textual data in the prompt-based pre-trained language model. Experimental results on two benchmark harmful memes datasets demonstrate that our method outperformed state-of-the-art methods. We conclude with the transferability and robustness of our approach to identify creative harmful memes.

The Harmonic Memory: a Knowledge Graph of harmonic patterns as a trustworthy framework for computational creativity

Computationally creative systems for music have recently achieved impressive results, fuelled by progress in generative machine learning. However, black-box approaches have raised fundamental concerns for ethics, accountability, explainability, and musical plausibility. To enable trustworthy machine creativity, we introduce the Harmonic Memory, a Knowledge Graph (KG) of harmonic patterns extracted from a large and heterogeneous musical corpus. By leveraging a cognitive model of tonal harmony, chord progressions are segmented into meaningful structures, and patterns emerge from their comparison via harmonic similarity. Akin to a music memory, the KG holds temporal connections between consecutive patterns, as well as salient similarity relationships. After demonstrating the validity of our choices, we provide examples of how this design enables novel pathways for combinational creativity. The memory provides a fully accountable and explainable framework to inspire and support creative professionals – allowing for the discovery of progressions consistent with given criteria, the recomposition of harmonic sections, but also the co-creation of new progressions.

SA-Fusion: Multimodal Fusion Approach for Web-based Human-Computer Interaction in the Wild

Web-based AR technology has broadened human-computer interaction scenes from traditional mechanical devices and flat screens to the real world, resulting in unconstrained environmental challenges such as complex backgrounds, extreme illumination, depth range differences, and hand-object interaction. The previous hand detection and 3D hand pose estimation methods are usually based on single modality such as RGB or depth data, which are not available in some scenarios in unconstrained environments due to the differences between the two modalities. To address this problem, we propose a multimodal fusion approach, named Scene-Adapt Fusion (SA-Fusion), which can fully utilize the complementarity of RGB and depth modalities in web-based HCI tasks. SA-Fusion can be applied in existing hand detection and 3D hand pose estimation frameworks to boost their performance, and can be further integrated into the prototyping AR system to construct a web-based interactive AR application for unconstrained environments. To evaluate the proposed multimodal fusion method, we conduct two user studies on CUG Hand and DexYCB dataset, to demonstrate its effectiveness in terms of accurately detecting hand and estimating 3D hand pose in unconstrained environments and hand-object interaction.

A Prompt Log Analysis of Text-to-Image Generation Systems

Recent developments in large language models (LLM) and generative AI have unleashed the astonishing capabilities of text-to-image generation systems to synthesize high-quality images that are faithful to a given reference text, known as a “prompt”. These systems have immediately received lots of attention from researchers, creators, and common users. Despite the plenty of efforts to improve the generative models, there is limited work on understanding the information needs of the users of these systems at scale. We conduct the first comprehensive analysis of large-scale prompt logs collected from multiple text-to-image generation systems. Our work is analogous to analyzing the query logs of Web search engines, a line of work that has made critical contributions to the glory of the Web search industry and research. Compared with Web search queries, text-to-image prompts are significantly longer, often organized into special structures that consist of the subject, form, and intent of the generation tasks and present unique categories of information needs. Users make more edits within creation sessions, which present remarkable exploratory patterns. There is also a considerable gap between the user-input prompts and the captions of the images included in the open training data of the generative models. Our findings provide concrete implications on how to improve text-to-image generation systems for creation purposes.

CAM: A Large Language Model-based Creative Analogy Mining Framework

Analogies inspire creative solutions to problems, and facilitate the creative expression of ideas and the explanation of complex concepts. They have widespread applications in scientific innovation, creative writing, and education. The ability to discover creative analogies that are not explicitly mentioned but can be inferred from the web is highly desirable to power all such applications dynamically and augment human creativity. Recently, Large Pre-trained Language Models (PLMs), trained on massive Web data, have shown great promise in generating mostly known analogies that are explicitly mentioned on the Web. However, it is unclear how they could be leveraged for mining creative analogies not explicitly mentioned on the Web. We address this challenge and propose Creative Analogy Mining (CAM), a novel framework for mining creative analogies, which consists of the following three main steps: 1) Generate analogies using PLMs with effectively designed prompts, 2) Evaluate their quality using scoring functions, and 3) Refine the low-quality analogies by another round of prompt-based generation. We propose both unsupervised and supervised instantiations of the framework so that it can be used even without any annotated data. Based on human evaluation using Amazon Mechanical Turk, we find that our unsupervised framework can mine 13.7% highly-creative and 56.37% somewhat-creative analogies. Moreover, our supervised scores are generally better than the unsupervised ones and correlate moderately with human evaluators, indicating that they would be even more effective at mining creative analogies. These findings also shed light on the creativity of PLMs 1.

Tangible Web: An Interactive Immersion Virtual Reality Creativity System that Travels Across Reality

With the advancement of virtual reality (VR) technology, virtual displays have become integral to how museums, galleries, and other tourist destinations present their collections to the public. However, the current lack of immersion in virtual reality displays limits the user’s ability to experience and appreciate its aesthetics. This paper presents a case study of a creative approach taken by a tourist attraction venue in developing a physical network system that allows visitors to enhance VR’s aesthetic aspects based on environmental parameters gathered by external sensors. Our system was collaboratively developed through interviews and sessions with twelve stakeholder groups interested in art and exhibitions. This paper demonstrates how our technological advancements in interaction, immersion and visual attractiveness surpass those of earlier virtual display generations. Through multimodal interaction, we aim to encourage innovation on the Web and create more visually appealing and engaging virtual displays. It is hoped that the greater online art community will gain fresh insight into how people interact with virtual worlds as a result of this work.

Coherent Topic Modeling for Creative Multimodal Data on Social Media

The creative web is all about combining different types of media to create a unique and engaging online experience. Multimodal data, such as text and images, is a key component in the creative web. Social media posts that incorporate both text descriptions and images offer a wealth of information and context. Text in social media posts typically relates to one topic, while images often convey information about multiple topics due to the richness of visual content. Despite this potential, many existing multimodal topic models do not take these criteria into account, resulting in poor quality topics being generated. Therefore, we proposed a Coherent Topic modeling for Multimodal Data (CTM-MM), which takes into account that text in social media posts typically relates to one topic, while images can contain information about multiple topics. Our experimental results show that CTM-MM outperforms traditional multimodal topic models in terms of classification and topic coherence.

SESSION: Web4Good

Improving Health Mention Classification Through Emphasising Literal Meanings: A Study Towards Diversity and Generalisation for Public Health Surveillance

People often use disease or symptom terms on social media and online forums in ways other than to describe their health. Thus the NLP health mention classification (HMC) task aims to identify posts where users are discussing health conditions literally, not figuratively. Existing computational research typically only studies health mentions within well-represented groups in developed nations. Developing countries with limited health surveillance abilities fail to benefit from such data to manage public health crises. To advance the HMC research and benefit more diverse populations, we present the Nairaland health mention dataset (NHMD), a new dataset collected from a dedicated web forum for Nigerians. NHMD consists of 7,763 manually labelled posts extracted based on four prevalent diseases (HIV/AIDS, Malaria, Stroke and Tuberculosis) in Nigeria. With NHMD, we conduct extensive experiments using current state-of-the-art models for HMC and identify that, compared to existing public datasets, NHMD contains out-of-distribution examples. Hence, it is well suited for domain adaptation studies. The introduction of the NHMD dataset imposes better diversity coverage of vulnerable populations and generalisation for HMC tasks in a global public health surveillance setting. Additionally, we present a novel multi-task learning approach for HMC tasks by combining literal word meaning prediction as an auxiliary task. Experimental results demonstrate that the proposed approach outperforms state-of-the-art methods statistically significantly (p < 0.01, Wilcoxon test) in terms of F1 score over the state-of-the-art and shows that our new dataset poses a strong challenge to the existing HMC methods.

Facility Relocation Search For Good: When Facility Exposure Meets User Convenience

In this paper, we propose a novel facility relocation problem where facilities (and their services) are portable, which is a combinatorial search problem with many practical applications. Given a set of users, a set of existing facilities, and a set of potential sites, we decide which of the existing facilities to relocate to potential sites, such that two factors are satisfied: (1) facility exposure: facilities after relocation have balanced exposure, namely serving equivalent numbers of users; (2) user convenience: it is convenient for users to access the nearest facility, which provides services with shorter travel distance. This problem is motivated by applications such as dynamically redistributing vaccine resources to align supply with demand for different vaccination centers, and relocating the bike sharing sites daily to improve the transportation efficiency. We first prove that this problem is NP-hard, and then we propose two algorithms: a non-learning best response algorithm () and a reinforcement learning algorithm (). In particular, the best response algorithm finds a Nash equilibrium to balance the facility-related and the user-related goals. To avoid being confined to only one Nash equilibrium, as found in the method, we also propose the reinforcement learning algorithm for long-term benefits, where each facility is an agent and we determine whether a facility needs to be relocated or not. To verify the effectiveness of our methods, we adopt multiple metrics to evaluate not only our objective, but also several other facility exposure equity and user convenience metrics to understand the benefits after facility relocation. Finally, comprehensive experiments using real-world datasets provide insights into the effectiveness of the two algorithms in practice.

A Multi-task Model for Emotion and Offensive Aided Stance Detection of Climate Change Tweets

In this work, we address the United Nations Sustainable Development Goal 13: Climate Action by focusing on identifying public attitudes toward climate change on social media platforms such as Twitter. Climate change is threatening the health of the planet and humanity. Public engagement is critical to address climate change. However, climate change conversations on Twitter tend to polarize beliefs, leading to misinformation and fake news that influence public attitudes, often dividing them into climate change believers and deniers. Our paper proposes an approach to classify the attitude of climate change tweets (believe/deny/ambiguous) to identify denier statements on Twitter. Most existing approaches for detecting stances and classifying climate change tweets either overlook deniers’ tweets or do not have a suitable architecture. The relevant literature suggests that emotions and higher levels of toxicity are prevalent in climate change Twitter conversations, leading to a delay in appropriate climate action. Therefore, our work focuses on learning stance detection (main task) while exploiting the auxiliary tasks of recognizing emotions and offensive utterances. We propose a multimodal multitasking framework MEMOCLiC that captures the input data using different embedding techniques and attention frameworks, and then incorporates the learned emotional and offensive expressions to obtain an overall representation of the features relevant to the stance of the input tweet. Extensive experiments conducted on a novel curated climate change dataset and two benchmark stance detection datasets (SemEval-2016 and ClimateStance-2022) demonstrate the effectiveness of our approach.

Learning Faithful Attention for Interpretable Classification of Crisis-Related Microblogs under Constrained Human Budget

The recent widespread use of social media platforms has created convenient ways to obtain and spread up-to-date information during crisis events such as disasters. Time-critical analysis of crisis data can help human organizations gain actionable information and plan for aid responses. Many existing studies have proposed methods to identify informative messages and categorize them into different humanitarian classes. Advanced neural network architectures tend to achieve state-of-the-art performance, but the model decisions are opaque. While attention heatmaps show insights into the model’s prediction, some studies found that standard attention does not provide meaningful explanations. Alternatively, recent works proposed interpretable approaches for the classification of crisis events that rely on human rationales to train and extract short snippets as explanations. However, the rationale annotations are not always available, especially in real-time situations for new tasks and events. In this paper, we propose a two-stage approach to learn the rationales under minimal human supervision and derive faithful machine attention. Extensive experiments over four crisis events show that our model is able to obtain better or comparable classification performance (∼ 86% Macro-F1) to baselines and faithful attention heatmaps using only 40-50% human-level supervision. Further, we employ a zero-shot learning setup to detect actionable tweets along with actionable word snippets as rationales.

Exploring Social Media for Early Detection of Depression in COVID-19 Patients

The COVID-19 pandemic has caused substantial damage to global health. Even though three years have passed, the world continues to struggle with the virus. Concerns are growing about the impact of COVID-19 on the mental health of infected individuals, who are more likely to experience depression, which can have long-lasting consequences for both the affected individuals and the world. Detection and intervention at an early stage can reduce the risk of depression in COVID-19 patients. In this paper, we investigated the relationship between COVID-19 infection and depression through social media analysis. Firstly, we managed a dataset of COVID-19 patients that contains information about their social media activity both before and after infection. Secondly, We conducted an extensive analysis of this dataset to investigate the characteristic of COVID-19 patients with a higher risk of depression. Thirdly, we proposed a deep neural network for early prediction of depression risk. This model considers daily mood swings as a psychiatric signal and incorporates textual and emotional characteristics via knowledge distillation. Experimental results demonstrate that our proposed framework outperforms baselines in detecting depression risk, with an AUROC of 0.9317 and an AUPRC of 0.8116. Our model has the potential to enable public health organizations to initiate prompt intervention with high-risk patients.

Attacking Fake News Detectors via Manipulating News Social Engagement

Social media is one of the main sources for news consumption, especially among the younger generation. With the increasing popularity of news consumption on various social media platforms, there has been a surge of misinformation which includes false information or unfounded claims. As various text- and social context-based fake news detectors are proposed to detect misinformation on social media, recent works start to focus on the vulnerabilities of fake news detectors. In this paper, we present the first adversarial attack framework against Graph Neural Network (GNN)-based fake news detectors to probe their robustness. Specifically, we leverage a multi-agent reinforcement learning (MARL) framework to simulate the adversarial behavior of fraudsters on social media. Research has shown that in real-world settings, fraudsters coordinate with each other to share different news in order to evade the detection of fake news detectors. Therefore, we modeled our MARL framework as a Markov Game with bot, cyborg, and crowd worker agents, which have their own distinctive cost, budget, and influence. We then use deep Q-learning to search for the optimal policy that maximizes the rewards. Extensive experimental results on two real-world fake news propagation datasets demonstrate that our proposed framework can effectively sabotage the GNN-based fake news detector performance. We hope this paper can provide insights for future research on fake news detection.

Cross-center Early Sepsis Recognition by Medical Knowledge Guided Collaborative Learning for Data-scarce Hospitals

There are significant regional inequities in health resources around the world. It has become one of the most focused topics to improve health services for data-scarce hospitals and promote health equity through knowledge sharing among medical institutions. Because electronic medical records (EMRs) contain sensitive personal information, privacy protection is unavoidable and essential for multi-hospital collaboration. In this paper, for a common disease in ICU patients, sepsis, we propose a novel cross-center collaborative learning framework guided by medical knowledge, SofaNet, to achieve early recognition of this disease. The Sepsis-3 guideline, published in 2016, defines that sepsis can be diagnosed by satisfying both suspicion of infection and Sequential Organ Failure Assessment (SOFA) greater than or equal to 2. Based on this knowledge, SofaNet adopts a multi-channel GRU structure to predict SOFA values of different systems, which can be seen as an auxiliary task to generate better health status representations for sepsis recognition. Moreover, we only achieve feature distribution alignment in the hidden space during cross-center collaborative learning, which ensures secure and compliant knowledge transfer without raw data exchange. Extensive experiments on two open clinical datasets, MIMIC-III and Challenge, demonstrate that SofaNet can benefit early sepsis recognition when hospitals only have limited EMRs.

ContrastFaux: Sparse Semi-supervised Fauxtography Detection on the Web using Multi-view Contrastive Learning

The widespread misinformation on the Web has raised many concerns with serious societal consequences. In this paper, we study a critical type of online misinformation, namely fauxtography, where the image and associated text of a social media post jointly convey a questionable or false sense. In particular, we focus on a sparse semi-supervised fauxtography detection problem, which aims to accurately identify fauxtography by only using the sparsely annotated ground truth labels of social media posts. Our problem is motivated by the key limitation of current fauxtography detection approaches that often require a large amount of expensive and inefficient manual annotations to train an effective fauxtography detection model. We identify two key technical challenges in solving the problem: 1) it is non-trivial to train an accurate detection model given the sparse fauxtography annotations, and 2) it is difficult to extract the heterogeneous and complicated fauxtography features from the multi-modal social media posts for accurate fauxtography detection. To address the above challenges, we propose ContrastFaux, a multi-view contrastive learning framework that jointly explores the sparse fauxtography annotations and the cross-modal fauxtography feature similarity between the image and text in multi-modal posts to accurately detect fauxtography on social media. Evaluation results on two social media datasets demonstrate that ContrastFaux consistently outperforms state-of-the-art deep learning and semi-supervised learning fauxtography detection baselines by achieving the highest fauxtography detection accuracy.

CaML: Carbon Footprinting of Household Products with Zero-Shot Semantic Text Similarity

Products contribute to carbon emissions in each phase of their life cycle, from manufacturing to disposal. Estimating the embodied carbon in products is a key step towards understanding their impact, and undertaking mitigation actions. Precise carbon attribution is challenging at scale, requiring both domain expertise and granular supply chain data. As a first-order approximation, standard reports use Economic Input-Output based Life Cycle Assessment (EIO-LCA) which estimates carbon emissions per dollar at an industry sector level using transactions between different parts of the economy. EIO-LCA models map products to an industry sector, and uses the corresponding carbon per dollar estimates to calculate the embodied carbon footprint of a product. An LCA expert needs to map each product to one of upwards of 1000 potential industry sectors. To reduce the annotation burden, the standard practice is to group products by categories, and map categories to their corresponding industry sector. We present CaML, an algorithm to automate EIO-LCA using semantic text similarity matching by leveraging the text descriptions of the product and the industry sector. CaML uses a pre-trained sentence transformer model to rank the top-5 matches, and asks a human to check if any of them are a good match. We annotated 40K products with non-experts. Our results reveal that pre-defined product categories are heterogeneous with respect to EIO-LCA industry sectors, and lead to a large mean absolute percentage error (MAPE) of 51% in kgCO2e/$. CaML outperforms the previous manually intensive method, yielding a MAPE of 22% with no domain labels (zero-shot). We compared annotations of a small sample of 210 products with LCA experts, and find that CaML accuracy is comparable to that of annotations by non-experts.

Identifying Checkworthy CURE Claims on Twitter

Medical claims on social media, if left unchecked, have the potential to directly affect the well-being of consumers of online health information. However, existing studies on claim detection do not specifically focus on medical cure aspects, neither do they address if a cure claim is “checkworthy", an indicator of whether a claim is potentially beneficial or harmful, if unchecked. In this paper, we address these limitations by compiling CW-CURE, a novel dataset of CURE tweets, namely tweets containing claims on prevention, diagnoses, risks, treatments, and cures of medical conditions. CW-CURE contains tweets on four major health conditions, namely, Alzheimer’s disease, Cancer, Diabetes, and Depression annotated for claims, their “checkworthiness", as well as the different types of claims such as quantitative claim, correlation/causation, personal experience, and future prediction. We describe our processing pipeline for compiling CW-CURE and present classification results on CURE tweets using transformer-based models. In particular, we harness claim-type information obtained with zero-shot learning to show significant improvements in checkworthiness identification. Through CW-CURE, we hope to enable research on models for effective identification and flagging of impactful CURE content, to safeguard the public’s consumption of medical content online.

MSQ-BioBERT: Ambiguity Resolution to Enhance BioBERT Medical Question-Answering

Question answering (QA) is a task in the field of natural language processing (NLP) and information retrieval, which has pivotal applications in areas such as online reading comprehension and web search engines. Currently, Bidirectional Encoder Representations from Transformers (BERT) and its biomedical variation (BioBERT) achieve impressive results on the reading comprehension QA datasets and medical-related QA datasets, and so they are widely used for a variety of passage-based QA tasks. However, their performances rapidly deteriorate when encountering passage and context ambiguities. This issue is prevalent and unavoidable in many fields, notably the web-based medical field. In this paper, we introduced a novel approach called the Multiple Synonymous Questions BioBERT (MSQ-BioBERT), which integrates question augmentation, rather than the typical single question used by traditional BioBERT, to elevate BioBERT’s performance on medical QA tasks. In addition, we constructed an ambiguous medical dataset based on the information from Wikipedia web. Experiments with both this web-based constructed medical dataset and open biomedical datasets demonstrate the significant performance gains of the MSQ-BioBERT approach, showcasing a new method for addressing ambiguity in medical QA tasks.

Interpreting wealth distribution via poverty map inference using multimodal data

Poverty maps are essential tools for governments and NGOs to track socioeconomic changes and adequately allocate infrastructure and services in places in need. Sensor and online crowd-sourced data combined with machine learning methods have provided a recent breakthrough in poverty map inference. However, these methods do not capture local wealth fluctuations, and are not optimized to produce accountable results that guarantee accurate predictions to all sub-populations. Here, we propose a pipeline of machine learning models to infer the mean and standard deviation of wealth across multiple geographically clustered populated places, and illustrate their performance in Sierra Leone and Uganda. These models leverage seven independent and freely available feature sources based on satellite images, and metadata collected via online crowd-sourcing and social media. Our models show that combined metadata features are the best predictors of wealth in rural areas, outperforming image-based models, which are the best for predicting the highest wealth quintiles. Our results recover the local mean and variation of wealth, and correctly capture the positive yet non-monotonous correlation between them. We further demonstrate the capabilities and limitations of model transfer across countries and the effects of data recency and other biases. Our methodology provides open tools to build towards more transparent and interpretable models to help governments and NGOs to make informed decisions based on data availability, urbanization level, and poverty thresholds.

Breaking Filter Bubble: A Reinforcement Learning Framework of Controllable Recommender System

In the information-overloaded era of the Web, recommender systems that provide personalized content filtering are now the mainstream portal for users to access Web information. Recommender systems deploy machine learning models to learn users’ preferences from collected historical data, leading to more centralized recommendation results due to the feedback loop. As a result, it will harm the ranking of content outside the narrowed scope and limit the options seen by users. In this work, we first conduct data analysis from a graph view to observe that the users’ feedback is restricted to limited items, verifying the phenomenon of centralized recommendation. We further develop a general simulation framework to derive the procedure of the recommender system, including data collection, model learning, and item exposure, which forms a loop. To address the filter bubble issue under the feedback loop, we then propose a general and easy-to-use reinforcement learning-based method, which can adaptively select few but effective connections between nodes from different communities as the exposure list. We conduct extensive experiments in the simulation framework based on large-scale real-world datasets. The results demonstrate that our proposed reinforcement learning-based control method can serve as an effective solution to alleviate the filter bubble and the separated communities induced by it. We believe the proposed framework of controllable recommendation in this work can inspire not only the researchers of recommender systems, but also a broader community concerned with artificial intelligence algorithms’ impact on humanity, especially for those vulnerable populations on the Web.

CollabEquality: A Crowd-AI Collaborative Learning Framework to Address Class-wise Inequality in Web-based Disaster Response

Web-based disaster response (WebDR) is emerging as a pervasive approach to acquire real-time situation awareness of disaster events by collecting timely observations from the Web (e.g., social media). This paper studies a class-wise inequality problem in WebDR applications where the objective is to address the limitation of current WebDR solutions that often have imbalanced classification performance across different classes. To address such a limitation, this paper explores the collaborative strengths of the diversified yet complementary biases of AI and crowdsourced human intelligence to ensure a more balanced and accurate performance for WebDR applications. However, two critical challenges exist: 1) it is difficult to identify the imbalanced AI results without knowing the ground-truth WebDR labels a priori; ii) it is non-trivial to address the class-wise inequality problem using potentially imperfect crowd labels. To address the above challenges, we develop CollabEquality, an inequality-aware crowd-AI collaborative learning framework that carefully models the inequality bias of both AI and human intelligence from crowdsourcing systems into a principled learning framework. Extensive experiments on two real-world WebDR applications demonstrate that CollabEquality consistently outperforms the state-of-the-art baselines by significantly reducing class-wise inequality while improving the WebDR classification accuracy.

The Impact of Covid-19 on Online Discussions: the Case Study of the Sanctioned Suicide Forum

The COVID-19 pandemic has been at the center of the lives of many of us for at least a couple of years, during which periods of isolation and lockdowns were common. How all that affected our mental well-being, especially the ones’ who were already in distress? To investigate the matter we analyse the online discussions on Sanctioned Suicide, a forum where users discuss suicide-related topics freely. We collected discussions starting from March 2018 (before pandemic) up to July 2022, for a total of 53K threads with 700K comments and 16K users. We investigate the impact of COVID-19 on the discussions in the forum. The data show that covid, while being present in the discussions, especially during the first lockdown, has not been the main reason why new users registered to the forum. However, covid appears to be indirectly connected to other causes of distress for the users, i.e. anxiety for the economy.

EDNet: Attention-Based Multimodal Representation for Classification of Twitter Users Related to Eating Disorders

Social media platforms provide rich data sources in several domains. In mental health, individuals experiencing an Eating Disorder (ED) are often hesitant to seek help through conventional healthcare services. However, many people seek help with diet and body image issues on social media. To better distinguish at-risk users who may need help for an ED from those who are simply commenting on ED in social environments, highly sophisticated approaches are required. Assessment of ED risks in such a situation can be done in various ways, and each has its own strengths and weaknesses. Hence, there is a need for and potential benefit of a more complex multimodal approach. To this end, we collect historical tweets, user biographies, and online behaviours of relevant users from Twitter, and generate a reasonably large labelled benchmark dataset. Thereafter, we develop an advanced multimodal deep learning model called EDNet using these data to identify the different types of users with ED engagement (e.g., potential ED sufferers, healthcare professionals, or communicators) and distinguish them from those not experiencing EDs on Twitter. EDNet consists of five deep neural network layers. With the help of its embedding, representation and behaviour modeling layers, it effectively learns the multimodalities of social media. In our experiments, EDNet consistently outperforms all the baseline techniques by significant margins. It achieves an accuracy of up to 94.32% and F1 score of up to 93.91% F1 score. To the best of our knowledge, this is the first such study to propose a multimodal approach for user-level classification according to their engagement with ED content on social media.

MoleRec: Combinatorial Drug Recommendation with Substructure-Aware Molecular Representation Learning

Combinatorial drug recommendation involves recommending a personalized combination of medication (drugs) to a patient over his/her longitudinal history, which essentially aims at solving a combinatorial optimization problem that pursues high accuracy under the safety constraint. Among existing learning-based approaches, the association between drug substructures (i.e., a sub-graph of the molecule that contributes to certain chemical effect) and the target disease is largely overlooked, though the function of drugs in fact exhibits strong relevance with particular substructures. To address this issue, we propose a molecular substructure-aware encoding method entitled MoleRec that entails a hierarchical architecture aimed at modeling inter-substructure interactions and individual substructures’ impact on patient’s health condition, in order to identify those substructures that really contribute to healing patients. Specifically, MoleRec learns to attentively pooling over substructure representations which will be element-wisely re-scaled by the model’s inferred relevancy with a patient’s health condition to obtain a prior-knowledge-informed drug representation. We further design a weight annealing strategy for drug-drug-interaction (DDI) objective to adaptively control the balance between accuracy and safety criteria throughout training. Experiments on the MIMIC-III dataset demonstrate that our approach achieves new state-of-the-art performance w.r.t. four accuracy and safety metrics. Our source code is publicly available at https://github.com/yangnianzu0515/MoleRec.

Detecting and Limiting Negative User Experiences in Social Media Platforms

Item ranking is important to a social media platform’s success. The order in which posts, videos, messages, comments, ads, used products, notifications are presented to a user greatly affects the time spent on the platform, how often they visit it, how much they interact with each other, and the quantity and quality of the content they post. To this end, item ranking algorithms use models that predict the likelihood of different events, e.g., the user liking, sharing, commenting on a video, clicking/converting on an ad, or opening the platform’s app from a notification. Unfortunately, by solely relying on such event-prediction models, social media platforms tend to over optimize for short-term objectives and ignore the long-term effects. In this paper, we propose an approach that aims at improving item ranking long-term impact. The approach primarily relies on an ML model that predicts negative user experiences. The model utilizes all available UI events: the details of an action can reveal how positive or negative the user experience has been; for example, a user writing a lengthy report asking for a given video to be taken down, likely had a very negative experience. Furthermore, the model takes into account detected integrity (e.g., hostile speech or graphic violence) and quality (e.g., click or engagement bait) issues with the content. Note that those issues can be perceived very differently from different users. Therefore, developing a personalized model, where a prediction refers to a specific user for a specific piece of content at a specific point in time, is a fundamental design choice in our approach. Besides the personalized ML model, our approach consists of two more pieces: (a) the way the personalized model is integrated with an item ranking algorithm and (b) the metrics, methodology, and success criteria for the long term impact of detecting and limiting negative user experiences. Our evaluation process uses extensive A/B testing on the Facebook platform: we compare the impact of our approach in treatment groups against production control groups. The AB test results indicate a 5% to 50% reduction in hides, reports, and submitted feedback. Furthermore, we compare against a baseline that does not include some of the crucial elements of our approach: the comparison shows our approach has a 100x to 30x lower False Positive Ratio than a baseline. Lastly, we present the results from a large scale survey, where we observe a statistically significant improvement of 3 to 6 percent in users’ sentiment regarding content suffering from nudity, clickbait, false / misleading, witnessing-hate, and violence issues.

Learning like human annotators: Cyberbullying detection in lengthy social media sessions

The inherent characteristic of cyberbullying of being a recurrent attitude calls for the investigation of the problem by looking at social media sessions as a whole, beyond just isolated social media posts. However, the lengthy nature of social media sessions challenges the applicability and performance of session-based cyberbullying detection models. This is especially true when one aims to use state-of-the-art Transformer-based pre-trained language models, which only take inputs of a limited length. In this paper, we address this limitation of transformer models by proposing a conceptually intuitive framework called LS-CB, which enables cyberbullying detection from lengthy social media sessions. LS-CB relies on the intuition that we can effectively aggregate the predictions made by transformer models on smaller sliding windows extracted from lengthy social media sessions, leading to an overall improved performance. Our extensive experiments with six transformer models on two session-based datasets show that LS-CB consistently outperforms three types of competitive baselines including state-of-the-art cyberbullying detection models. In addition, we conduct a set of qualitative analyses to validate the hypotheses that cyberbullying incidents can be detected through aggregated analysis of smaller chunks derived from lengthy social media sessions (H1), and that cyberbullying incidents can occur at different points of the session (H2), hence positing that frequently used text truncation strategies are suboptimal compared to relying on holistic views of sessions. Our research in turn opens an avenue for fine-grained cyberbullying detection within sessions in future work.

On Detecting Policy-Related Political Ads: An Exploratory Analysis of Meta Ads in 2022 French Election

Online political advertising has become the cornerstone of political campaigns. The budget spent solely on political advertising in the U.S. has increased by more than 100% from $ 700 million during the 2017-2018 U.S. election cycle to $ 1.6 billion during the 2020 U.S. presidential elections. Naturally, the capacity offered by online platforms to micro-target ads with political content has been worrying lawmakers, journalists, and online platforms, especially after the 2016 U.S. presidential election, where Cambridge Analytica has targeted voters with political ads congruent with their personality.

To curb such risks, both online platforms and regulators (through the DSA act proposed by the European Commission) have agreed that researchers, journalists, and civil society need to be able to scrutinize the political ads running on large online platforms. Consequently, online platforms such as Meta and Google have implemented Ad Libraries that contain information about all political ads running on their platforms. This is the first step on a long path. Due to the volume of available data, it is impossible to go through these ads manually, and we now need automated methods and tools to assist in the scrutiny of political ads.

In this paper, we focus on political ads that are related to policy. Understanding which policies politicians or organizations promote and to whom is essential in determining dishonest representations. This paper proposes automated methods based on pre-trained models to classify ads in 14 main policy groups identified by the Comparative Agenda Project (CAP). We discuss several inherent challenges that arise. Finally, we analyze policy-related ads featured on Meta platforms during the 2022 French presidential elections period.

Graph-based Village Level Poverty Identification

Poverty status identification is the first obstacle to eradicating poverty. Village-level poverty identification is very challenging due to the arduous field investigation and insufficient information. The development of the Web infrastructure and its modeling tools provides fresh approaches to identifying poor villages. Upon those techniques, we build a village graph for village poverty status identification. By modeling the village connections as a graph through the geographic distance, we show the correlation between village poverty status and its graph topological position and identify two key factors (Centrality, Homophily Decaying effect) for identifying villages. We further propose the first graph-based method to identify poor villages. It includes a global Centrality2Vec module to embed village centrality into the dense vector and a local graph distance convolution module that captures the decaying effect. In this paper, we make the first attempt to interpret and identify village-level poverty from a graph perspective.

Mapping Flood Exposure, Damage, and Population Needs Using Remote and Social Sensing: A Case Study of 2022 Pakistan Floods

The devastating 2022 floods in Pakistan resulted in a catastrophe impacting millions of people and destroying thousands of homes. While disaster management efforts were taken, crisis responders struggled to understand the country-wide flood extent, population exposure, urgent needs of affected people, and various types of damage. To tackle this challenge, we leverage remote and social sensing with geospatial data using state-of-the-art machine learning techniques for text and image processing. Our satellite-based analysis over a one-month period (25 Aug–25 Sep) revealed that 11.48% of Pakistan was inundated. When combined with geospatial data, this meant 18.9 million people were at risk across 160 districts in Pakistan, with adults constituting 50% of the exposed population. Our social sensing data analysis surfaced 106.7k reports pertaining to deaths, injuries, and concerns of the affected people. To understand the urgent needs of the affected population, we analyzed tweet texts and found that South Karachi, Chitral and North Waziristan required the most basic necessities like food and shelter. Further analysis of tweet images revealed that Lasbela, Rajanpur, and Jhal Magsi had the highest damage reports normalized by their population. These damage reports were found to correlate strongly with affected people reports and need reports, achieving an R-Square of 0.96 and 0.94, respectively. Our extensive study shows that combining remote sensing, social sensing, and geospatial data can provide accurate and timely information during a disaster event, which is crucial in prioritizing areas for immediate and gradual response.

Web Information Extraction for Social Good: Food Pantry Answering As an Example

Social determinants of health (SDH) are the conditions in which we are born, live, work, and age. Food insecurity (FI) is an important domain of SDH. FI is associated with poor health outcomes. Food bank/pantry (food pantry) directly addresses FI. Improving the availability and quality of food from food pantries could reduce FI, leading to improved health outcomes. However, it is difficult for a client to access food pantry information. In this study, we built a food pantry answering framework by combining location-aware information retrieval, web information extraction and domain-specific answering. Our proposed framework first retrieves pantry candidates based on geolocation of the client, and utilizes structural information from markup language to extract semantic chunks related to six common client requests. We use BERT and RoBERTa as information extraction models and compare three different web page segmentation methods in the experiments.

Moral Narratives Around the Vaccination Debate on Facebook

Vaccine hesitancy is a complex issue with psychological, cultural, and even societal factors entangled in the decision-making process. The narrative around this process is captured in our everyday interactions; social media data offer a direct and spontaneous view of peoples’ argumentation. Here, we analysed more than 500,000 public posts and comments from Facebook Pages dedicated to the topic of vaccination to study the role of moral values and, in particular, the understudied role of the Liberty moral foundation from the actual user-generated text. We operationalise morality by employing the Moral Foundations Theory, while our proposed framework is based on recurrent neural network classifiers with a short memory and entity linking information. Our findings show that the principal moral narratives around the vaccination debate focus on the values of Liberty, Care, and Authority. Vaccine advocates urge compliance with the authorities as prosocial behaviour to protect society. On the other hand, vaccine sceptics mainly build their narrative around the value of Liberty, advocating for the right to choose freely whether to adhere or not to the vaccination. We contribute to the automatic understanding of vaccine hesitancy drivers emerging from user-generated text, providing concrete insights into the moral framing around vaccination decision-making. Especially in emergencies such as the Covid-19 pandemic, contrary to traditional surveys, these insights can be provided contemporary to the event, helping policymakers craft communication campaigns that adequately address the concerns of the hesitant population.

Gender Pay Gap in Sports on a Fan-Request Celebrity Video Site

The internet is often thought of as a democratizer, enabling equality in aspects such as pay, as well as a tool introducing novel communication and monetization opportunities. In this study we examine athletes on Cameo, a website that enables bi-directional fan-celebrity interactions, questioning whether the well-documented gender pay gaps in sports persist in this digital setting. Traditional studies into gender pay gaps in sports are mostly in a centralized setting where an organization decides the pay for the players, while Cameo facilitates grass-roots fan engagement where fans pay for video messages from their preferred athletes. The results showed that even on such a platform gender pay gaps persist, both in terms of cost-per-message, and in the number of requests, proxied by number of ratings. For instance, we find that female athletes have a median pay of 30$ per-video, while the same statistic is 40$ for men. The results also contribute to the study of parasocial relationships and personalized fan engagements over a distance. Something that has become more relevant during the ongoing COVID-19 pandemic, where in-person fan engagement has often been limited.

Knowledge-infused Contrastive Learning for Urban Imagery-based Socioeconomic Prediction

Monitoring sustainable development goals requires accurate and timely socioeconomic statistics, while ubiquitous and frequently-updated urban imagery in web like satellite/street view images has emerged as an important source for socioeconomic prediction. Especially, recent studies turn to self-supervised contrastive learning with manually designed similarity metrics for urban imagery representation learning and further socioeconomic prediction, which however suffers from effectiveness and robustness issues. To address such issues, in this paper, we propose a Knowledge-infused Contrastive Learning (KnowCL) model for urban imagery-based socioeconomic prediction. Specifically, we firstly introduce knowledge graph (KG) to effectively model the urban knowledge in spatiality, mobility, etc., and then build neural network based encoders to learn representations of an urban image in associated semantic and visual spaces, respectively. Finally, we design a cross-modality based contrastive learning framework with a novel image-KG contrastive loss, which maximizes the mutual information between semantic and visual representations for knowledge infusion. Extensive experiments of applying the learnt visual representations for socioeconomic prediction on three datasets demonstrate the superior performance of KnowCL with over 30% improvements on R2 compared with baselines. Especially, our proposed KnowCL model can apply to both satellite and street imagery with both effectiveness and transferability achieved, which provides insights into urban imagery-based socioeconomic prediction.

Leveraging Existing Literature on the Web and Deep Neural Models to Build a Knowledge Graph Focused on Water Quality and Health Risks

A knowledge graph focusing on water quality in relation to health risks posed by water activities (such as diving or swimming) is not currently available. To address this limitation, we first use existing resources to construct a knowledge graph relevant to water quality and health risks using KNowledge Acquisition and Representation Methodology (KNARM). Subsequently, we explore knowledge graph completion approaches for maintaining and updating the graph. Specifically, we manually identify a set of domain-specific UMLS concepts and use them to extract a graph of approximately 75,000 semantic triples from the Semantic MEDLINE database (which contains head-relation-tail triples extracted from PubMed). Using the resulting knowledge graph, we experiment with the KG-BERT approach for graph completion by employing pre-trained BERT/RoBERTa models and also models fine-tuned on a collection of water quality and health risks abstracts retrieved from the Web of Science. Experimental results show that KG-BERT with BERT/RoBERTa models fine-tuned on a domain-specific corpus improves the performance of KG-BERT with pre-trained models. Furthermore, KG-BERT gives better results than several translational distance or semantic matching baseline models.

Believability and Harmfulness Shape the Virality of Misleading Social Media Posts

Misinformation on social media presents a major threat to modern societies. While previous research has analyzed the virality across true and false social media posts, not every misleading post is necessarily equally viral. Rather, misinformation has different characteristics and varies in terms of its believability and harmfulness – which might influence its spread. In this work, we study how the perceived believability and harmfulness of misleading posts are associated with their virality on social media. Specifically, we analyze (and validate) a large sample of crowd-annotated social media posts from Twitter’s Birdwatch platform, on which users can rate the believability and harmfulness of misleading tweets. To address our research questions, we implement an explanatory regression model and link the crowd ratings for believability and harmfulness to the virality of misleading posts on Twitter. Our findings imply that misinformation that is (i) easily believable and (ii) not particularly harmful is associated with more viral resharing cascades. These results offer insights into how different kinds of crowd fact-checked misinformation spreads and suggest that the most viral misleading posts are often not the ones that are particularly concerning from the perspective of public safety. From a practical view, our findings may help platforms to develop more effective strategies to curb the proliferation of misleading posts on social media.

Enhancing Deep Knowledge Tracing with Auxiliary Tasks

Knowledge tracing (KT) is the problem of predicting students’ future performance based on their historical interactions with intelligent tutoring systems. Recent studies have applied multiple types of deep neural networks to solve the KT problem. However, there are two important factors in real-world educational data that are not well represented. First, most existing works augment input representations with the co-occurrence matrix of questions and knowledge components1 (KCs) but fail to explicitly integrate such intrinsic relations into the final response prediction task. Second, the individualized historical performance of students has not been well captured. In this paper, we proposed AT-DKT to improve the prediction performance of the original deep knowledge tracing model with two auxiliary learning tasks, i.e., question tagging (QT) prediction task and individualized prior knowledge (IK) prediction task. Specifically, the QT task helps learn better question representations by predicting whether questions contain specific KCs. The IK task captures students’ global historical performance by progressively predicting student-level prior knowledge that is hidden in students’ historical learning interactions. We conduct comprehensive experiments on three real-world educational datasets and compare the proposed approach to both deep sequential KT models and non-sequential models. Experimental results show that AT-DKT outperforms all sequential models with more than 0.9% improvements of AUC for all datasets, and is almost the second best compared to non-sequential models. Furthermore, we conduct both ablation studies and quantitative analysis to show the effectiveness of auxiliary tasks and the superior prediction outcomes of AT-DKT. To encourage reproducible research, we make our data and code publicly available at https://github.com/pykt-team/pykt-toolkit  2.

Vertical Federated Knowledge Transfer via Representation Distillation for Healthcare Collaboration Networks

Collaboration between healthcare institutions can significantly lessen the imbalance in medical resources across various geographic areas. However, directly sharing diagnostic information between institutions is typically not permitted due to the protection of patients’ highly sensitive privacy. As a novel privacy-preserving machine learning paradigm, federated learning (FL) makes it possible to maximize the data utility among multiple medical institutions. These feature-enrichment FL techniques are referred to as vertical FL (VFL). Traditional VFL can only benefit multi-parties’ shared samples, which strongly restricts its application scope. In order to improve the information-sharing capability and innovation of various healthcare-related institutions, and then to establish a next-generation open medical collaboration network, we propose a unified framework for vertical federated knowledge transfer mechanism (VFedTrans) based on a novel cross-hospital representation distillation component. Specifically, our framework includes three steps. First, shared samples’ federated representations are extracted by collaboratively modeling multi-parties’ joint features with current efficient vertical federated representation learning methods. Second, for each hospital, we learn a local-representation-distilled module, which can transfer the knowledge from shared samples’ federated representations to enrich local samples’ representations. Finally, each hospital can leverage local samples’ representations enriched by the distillation module to boost arbitrary downstream machine learning tasks. The experiments on real-life medical datasets verify the knowledge transfer effectiveness of our framework.

Learning to Simulate Crowd Trajectories with Graph Networks

Crowd stampede disasters often occur, such as recent ones in Indonesia and South Korea, and crowd simulation is particularly important to prevent and avoid such disasters. Most traditional models for crowd simulation, such as the social force model, are hand-designed formulas, which use Newtonian forces to model the interactions between pedestrians. However, such formula-based methods may not be flexible enough to capture the complex interaction patterns in diverse crowd scenarios. Recently, due to the development of the Internet, a large amount of pedestrian movement data has been collected, allowing us to study crowd simulation in a data-driven way. Inspired by the recent success of graph network-based simulation (GNS), we propose a novel method under the framework of GNS, which simulates the crowd in a data-driven way. Specifically, we propose to model the interactions among people and the environment using a heterogeneous graph. Then, we design a heterogeneous gated message-passing network to learn the interaction pattern that depends on the visual field. Finally, the randomness is introduced by modeling the context’s different influences on pedestrians with a probabilistic emission function. Extensive experiments on synthetic data, controlled-environment data and real-world data are performed. Extensive results show that our model can generally capture the three main factors which contribute to crowd trajectories while adapting to the data characteristics beyond the strong assumption of formulas-based methods. As a result, the proposed method outperforms existing methods by a large margin.