Ensemble methods are widely used in many machine learning applications such as classification and recommender systems. However, ensemble methods have been slow to develop in unsupervised domains such as outlier detection[1]. The earliest methods for outlier ensemble analysis include techniques such as feature bagging and isolation forests[5,6]. Subsequently, theoretical foundations were developed for outlier ensembles[2], which turned out to be analogous to those used in classification. Therefore, many outlier ensemble methods from classification can be generalized to outlier detection. However, the unsupervised nature of the outlier detection problem necessitates some changes to these algorithms. For example, subsampling methods need to be replaced by variable subsampling in order to obtain the best results[2]. This is because variable subsampling methods implicitly explore the parameter space over different base detectors so that problems associated with lack of supervision are addressed. Outlier ensemble methods can be either data-centric (in which components use different subsets or subspaces of the data) or they could be model-centric (in which components use different variations of model-centric design). Examples of data-centric methods include methods like feature bagging and subsampling, whereas examples of model-centric methods include Isolation Forests[6], RandNet [4], and Subspace Histograms[7]. However, techniques like variable subsampling seem to have characteristics of both types of ensembles.
Unsupervised algorithms like outlier are often hard to evaluate because different algorithms may perform better for different choices of parameters. In general, it is not fair to compare base detectors with ensembles, since techniques like variable subsampling almost always improve performance. In such cases, outlier ensembles could be used for evaluation of outlier detection algorithms[3] by wrapping the base detectors in variable subsampling. A large number of base detectors and their ensemble-centric versions were compared, and the correlations between different detectors was analyzed. This analysis was used to propose TRINITY[3], which is an ensemble-of-ensembles detector --- this detector combines the variable subsampling versions of three base detectors and seems to be very robust over a wide variety of data sets. Recently, outlier ensembles have also been used for meta-learning[8]. It is shown how a transfer resource of labeled data sets can be used to combine scores optimally from different detectors for a new unlabeled data set.
In this talk, I will present our work on fundamental advances in AI, inspired by interdisciplinary problem statements and societal challenges. I will also highlight our innovation journey that encapsulates both the opportunities and challenges inherent in harnessing the full potential of AI for societal benefit, in particular highlighting the realization of societal impact through translational work and partnerships. Additionally, I will highlight our educational endeavors, emphasizing experiential learning and interdisciplinary approaches as fundamental elements of the student experience.
The rise of Large Language Models (LLMs) has had a huge impact on the interaction of users with information. Many people argue that the age of search engines as we know them has ended, while other people argue that retrieval technology is more relevant than ever before, because we need information to be grounded in sources. In my talk I will argue that both statements are true. I will discuss the multiple relations between LLMs and Information Retrieval: how can they strengthen each other, what are the challenges we face, and what directions should we go in our research?
High-dimensional multiplex graphs are characterized by their high number of complementary and divergent dimensions. The existence of multiple hierarchical latent relations between the graph dimensions poses significant challenges to embedding methods. In particular, the geometric distortions that might occur in the representational space have been overlooked in the literature. This work studies the problem of high-dimensional multiplex graph embedding from a geometric perspective. We find that the node representations reside on highly curved manifolds, thus rendering their exploitation more challenging for downstream tasks. Moreover, our study reveals that increasing the number of graph dimensions can cause further distortions to the highly curved manifolds. To address this problem, we propose a novel multiplex graph embedding method that harnesses hierarchical dimension embedding and Hyperbolic Graph Neural Networks. The proposed approach hierarchically extracts hyperbolic node representations that reside on Riemannian manifolds while gradually learning fewer and more expressive latent dimensions of the multiplex graph. Experimental results on real-world high-dimensional multiplex graphs show that the synergy between hierarchical and hyperbolic embeddings incurs much fewer geometric distortions and brings notable improvements over state-of-the-art approaches on downstream tasks.
Given a sparse tensor, how can we accurately capture complex latent structures inherent in the tensor while maintaining the interpretability of those structures? Tensor decomposition is a fundamental technique for analyzing tensors. Classical tensor models provide multi-linear structures that are easy to interpret, but have limitations in capturing complex structures present in real-world sparse tensors. Recent neural tensor models have extended the capabilities of classical tensor models in capturing complex structures within the data. However, this has come at the cost of interpretability: neural tensor models entangle interactions across and within latent structures in a black-box manner, making it difficult to readily understand the discovered structures. Understanding these structures, however, is crucial in applications such as healthcare, which requires transparency in critical decision-making processes.
To overcome this major limitation and bridge the gap between the classical multi-linear models and neural tensor models, we propose Neural Additive Tensor Decomposition (NeAT), an accurate and interpretable neural tensor model for sparse tensors. The main idea of NeAT is to apply neural networks to each latent component in an additive fashion. This not only captures diverse patterns and complex structures in sparse tensors, but also provides a direct and intuitive interpretation of the structures by being close to the multi-linear tensor model. We conduct extensive experiments on six large real-world sparse tensors. NeAT outperforms the state-of-the-art neural tensor models in link prediction, surpassing a linear tensor model by 10% and the second-best neural tensor model by 4%, in accuracy. Through ablation studies, we explore various model designs for NeAT to identify key factors that impact generalization. Finally, we evaluate qualitatively and quantitatively latent patterns discovered by NeAT, demonstrating how to analyze the discovered latent patterns in real data obtained from NeAT.
The reproducibility of scientific articles is central to the advancement of science. Despite this importance, evaluating reproducibility remains challenging due to the scarcity of ground truth data. Predictive models can address this limitation by streamlining the tedious evaluation process. Typically, a paper's reproducibility is inferred based on the availability of artifacts such as code, data, or supplemental information, often without extensive empirical investigation. To address these issues, we utilized artifacts of papers as fundamental units to develop a novel, dual-spectrum framework that focuses on author-centric and external-agent perspectives. We used the author-centric spectrum, followed by the external-agent spectrum, to guide a structured, model-based approach to quantify and assess reproducibility. We explored the interdependencies between different factors influencing reproducibility and found that linguistic features such as readability and lexical diversity are strongly correlated with papers achieving the highest statuses on both spectrums. Our work provides a model-driven pathway for evaluating the reproducibility of scientific research.
Theory of mind (ToM) reasoning involves understanding that others have intentions, emotions, and thoughts, which is crucial for regulating one's reasoning. Although large language models (LLMs) excel in tasks such as summarization, question answering, and translation, they still face challenges with ToM reasoning, especially in open-ended questions. Despite advancements, the extent to which LLMs truly understand ToM reasoning and how closely it aligns with human ToM reasoning remains inadequately explored in open-ended scenarios. Motivated by this gap, we assess the abilities of LLMs to perceive and integrate human intentions and emotions into their ToM reasoning processes within open-ended questions. Our study utilizes posts from Reddit's ChangeMyView platform, which demands nuanced social reasoning to craft persuasive responses. Our analysis, comparing semantic similarity and lexical overlap metrics between responses generated by humans and LLMs, reveals clear disparities in ToM reasoning capabilities in open-ended questions, with even the most advanced models showing notable limitations. To enhance LLM capabilities, we implement a prompt tuning method that incorporates human intentions and emotions, resulting in improvements in ToM reasoning performance. However, despite these improvements, the enhancement still falls short of fully achieving human-like reasoning. This research highlights the deficiencies in LLMs' social reasoning and demonstrates how integrating human intentions and emotions can boost their effectiveness.
With the development of the Intelligent Transportation Systems, a great deal of work has been proposed to tackle traffic prediction tasks. Despite their good performance, most traffic prediction models are point estimation models, lacking the capability to estimate the uncertainties of future traffic data, which is crucial in practical traffic decision-making. Aiming at this problem, we combine the probabilistic estimation capabilities of conditional normalizing flows with the spatio-temporal relationship learning of spatio-temporal graphs, leading to a Spatio-Temporal Graph Normalizing Flow (STGNF) model to estimate the distribution of future traffic data. We are the first to employ the conditional normalizing flows as the backbone for probabilistic traffic prediction. Then we design a spatio-temporal graph conditional fusion network to learn the spatio-temporal relationships between future and historical traffic data, which are provided to the conditional normalizing flows as conditional information. Extensive experiments on two real-world traffic datasets demonstrate that our proposed model significantly outperforms the state-of-the-art baselines.
Citation Text Generation (CTG) in scientific documents often relies on standard summarization techniques, which may not fully capture the nuanced relationship between the citing and cited papers. To address this, we present a Multi-Source Citation Text Generation (M-CTG) architecture, leveraging a Seq2Seq transformer framework enhanced with keyphrase embeddings, graph embeddings, and text representations. This approach aims to produce more contextually relevant and accurate citation texts by integrating multiple sources of information. Our methodology is tested using the newly created CTG-S2ORC dataset, consisting of English-language computer science research papers. In a comparative analysis, we explore the performance of traditional Language Models (LMs) and demonstrate how Large Language Models (LLMs), particularly when integrated with various prompting techniques and Knowledge Graphs, offer superior capabilities in analyzing and generating citation texts. In addition to traditional evaluation metrics, we introduce a custom metric that emphasizes the overlap of key terms and semantic similarity, providing a more comprehensive assessment of our model's performance. Our code and data are available at https://github.com/midas-research/M-CTG/tree/main.
Out-of-distribution (OOD) aware classification aims to classify in-distribution samples into their respective classes while simultaneously detecting OOD samples. Previous works have largely focused on the image domain, where images from an unrelated dataset can serve as auxiliary OOD training data. In this work, we address OOD-aware classification for tabular data, where an unrelated dataset cannot be used as OOD training data. A potential solution to OOD-aware classification involves filtering out OOD samples using an outlier detection method and classifying the remaining samples with a traditional classification model. However, seamlessly integrating this approach into downstream optimization tasks is challenging due to the employment of multiple methods. Our approach is turning OOD-aware classification into traditional classification by augmenting the in-distribution training data with synthesized OOD data. This approach continues leveraging traditional classification methods while detecting OOD samples, and the learned model retains the same mathematical properties as traditional classification models, thus, it can be easily integrated into downstream tasks. We evaluate these benefits empirically using real-life datasets. Code is available at https://github.com/ah-ansari/OCT.
Large Language Models (LLMs) have shown impressive performance in various domains, prompting researchers to explore their potential application in recommendation systems. However, directly applying LLMs to recommendation tasks has proven to be less effective due to the significant gap between the data used for pre-training LLMs and the specific requirements of recommendation tasks. In this study, we propose Direct Multi-Preference Optimization (DMPO), a streamlined framework to bridge this gap and enhance the alignment of LLMs for recommendation tasks. DMPO can be viewed as a pair-wise ranking loss to distinguish between positive and negative samples in recommendation tasks. Furthermore, DMPO improves the performance of LLM-based recommenders by maximizing the probability of positive samples and minimizing the probability of multiple negative samples at the same time. Experimental evaluations are conducted to compare DMPO with traditional recommendation methods and other LLM-based recommendation methods. The results reveal that DMPO significantly enhances the recommendation capabilities of LLMs across three real-world public datasets in few-shot scenarios. Furthermore, the experiments also demonstrate that DMPO exhibits superior generalization ability in cross-domain recommendation. A case study elucidates the reasons behind these consistent improvements and also underscores DMPO's potential as an explainable recommendation system. Our code and data are available at https://github.com/BZX667/DMPO.
Pre-trained Foundation Models (PFMs) have ushered in a paradigm-shift in AI, due to their ability to learn general-purpose representations that can be readily employed in downstream tasks. While PFMs have been successfully adopted in various fields such as NLP and Computer Vision, their capacity in handling geospatial data remains limited. This can be attributed to the intrinsic heterogeneity of such data, which encompasses different types, including points, segments and regions, as well as multiple information modalities. The proliferation of Volunteered Geographic Information initiatives, like OpenStreetMap, unveils a promising opportunity to bridge this gap. In this paper, we present CityFM, a self-supervised framework to train a foundation model within a selected geographical area. CityFM relies solely on open data from OSM, and produces multimodal representations, incorporating spatial, visual, and textual information. We analyse the entity representations generated by our foundation models from a qualitative perspective, and conduct experiments on road, building, and region-level downstream tasks. In all the experiments, CityFM achieves performance superior to, or on par with, application-specific algorithms.
We present Learning Attributions (LA), a novel method for explaining language models. The core idea behind LA is to train a dedicated attribution model that functions as a surrogate explainer for the language model. This attribution model is designed to identify which tokens are most influential in driving the model's predictions. By optimizing the attribution model to mask the minimal amount of information necessary to induce substantial changes in the language model's output, LA provides a mechanism to understand which tokens in the input are critical for the model's decisions. We demonstrate the effectiveness of LA across several language models, highlighting its superiority over multiple state-of-the-art explanation methods across various datasets and evaluation metrics.
Graphs are a fundamental data structure used to represent relationships in domains as diverse as the social sciences, bioinformatics, cybersecurity, the Internet, and more. One of the central observations in network science is that real-world graphs are globally sparse, yet contain numerous "pockets" of high edge density. A fundamental task in graph mining is to discover these dense subgraphs. Most common formulations of the problem involve finding a single (or a few) "optimally" dense subsets. But in most real applications, one does not care for the optimality. Instead, we want to find a large collection of dense subsets that covers a significant fraction of the input graph. We give a mathematical formulation of this problem, using a new definition of regularly triangle-rich (RTR) families. These families capture the notion of dense subgraphs that contain many triangles and have degrees comparable to the subgraph size. We design a provable algorithm, RTRExtractor, that can discover RTR families that approximately cover any RTR set. The algorithm is efficient and is inspired by recent results that use triangle counts for community testing and clustering. We show that RTRExtractor has excellent behavior on a large variety of real-world datasets. It is able to process graphs with hundreds of millions of edges within minutes. Across many datasets, RTRExtractor achieves high coverage using high edge density datasets. For example, the output covers a quarter of the vertices with subgraphs of edge density more than (say) 0.5, for datasets with 10M+ edges. We show an example of how the output of RTRExtractor correlates with meaningful sets of similar vertices in a citation network, demonstrating the utility of RTRExtractor for unsupervised graph discovery tasks.
Numerous algorithms have been proposed for discovering denial constraints (DCs), which are essential and effective for maintaining data consistency. However, existing methods only focus on discovering the complete set of DCs, often resulting in hundreds or even tens of thousands of discovered rules. Such a large number of DCs are impractical for users to verify and utilize. Besides, these methods overlook the intent of users, which requires the discovered DCs to be succinct, relevant, and diverse concurrently. To address these limitations, we introduce DCMiner, a deep reinforcement learning (DRL)-based framework that produces rules satisfying user preferences. Specifically, we first model the discovering process via a kCover Markov decision process to improve efficiency. Then, a graphQ model is introduced to capture the data distribution and facilitate the discovery of DCs. Lastly, we design a reward function that flexibly integrates both objective and subjective criteria to align the discovered rules with user intent, and we propose an efficient training process. Extensive experiments on both real-world and synthetic datasets show that DCMiner can discover succinct, relevant, and diverse rules.
Recently, generative models based on the diffusion process have emerged as a promising direction for automating the design of molecules. However, directly adding continuous Gaussian noise to discrete graphs leads to the problem that the generated data do not conform to the discrete graph data distribution in the training set. Current graph diffusion models either corrupt discrete data through a transition matrix or relax the discrete data to continuous space for the diffusion process. These approaches make it hard to perform extensible conditional generation, such as adapting to text-based conditions, due to the lack of embedding representations and require significant computation resources due to the diffusion process of the bond type matrix. This paper introduces the Hierarchical Graph Latent Diffusion Model (HGLDM), a novel variant of latent diffusion models that overcomes the problem of applying continuous diffusion models directly to discrete graph data. Meanwhile, based on the latent diffusion framework, HGLDM avoids the issues of computational consumption and lack of embeddings for extensible conditional generation. In addition, by comparing the HGLDM with its variant, the Graph Latent Diffusion Model (GLDM), which only has graph-level embeddings, we validate the advantage of the hierarchical graph structure for capturing the relationship between structure information and molecular properties. We evaluate the performance of our model through various conditional generation tasks, demonstrating its superior performance.
Edge-computing methods allow devices to efficiently train a high-performing, robust, and personalized model for predictive tasks. However, these methods succumb to privacy and scalability concerns such as adversarial data recovery and expensive model communication. Furthermore, edge computing methods unrealistically assume that all devices train an identical model. In practice, edge devices have varying computational and memory constraints, which may not allow certain devices to have the space or speed to train a specific model. To overcome these issues, we propose MIDDLE, a model-independent distributed learning algorithm that allows heterogeneous edge devices to assist each other in training while communicating only non-sensitive information. MIDDLE unlocks the ability for edge devices, regardless of computational or memory constraints, to assist each other even with completely different model architectures. Furthermore, MIDDLE does not require model or gradient communication, significantly reducing communication size and time. We prove that MIDDLE attains the optimal convergence rate of stochastic gradient descent for convex and non-convex smooth optimization. Finally, our experimental results demonstrate that MIDDLE attains robust and high-performing models without model or gradient communication.
The significance of mental health classification is paramount in contemporary society, where digital platforms serve as crucial sources for monitoring individuals' well-being. However, existing social media mental health datasets primarily consist of text-only samples, potentially limiting the efficacy of models trained on such data. Recognising that humans utilise cross-modal information to comprehend complex situations or issues, we present a novel approach to address the limitations of current methodologies. In this work, we introduce a Multimodal and Multi-Teacher Knowledge Distillation model for Mental Health Classification, leveraging insights from cross-modal human understanding. Unlike conventional approaches that often rely on simple concatenation to integrate diverse features, our model addresses the challenge of appropriately representing inputs of varying natures (e.g., texts and sounds). To mitigate the computational complexity associated with integrating all features into a single model, we employ a multimodal and multi-teacher architecture. By distributing the learning process across multiple teachers, each specialising in a particular feature extraction aspect, we enhance the overall mental health classification performance. Through experimental validation, we demonstrate the efficacy of our model in achieving improved performance.
Rank fusion is a technique for combining multiple rankings into a single aggregated ranking, commonly used in high-stakes applications. For hiring decisions, a fused ranking might combine evaluations of different candidates from various job boards into one list. Ideally, such fused rankings are fair. Meaning they do not withhold opportunities or resources from marginalized groups of candidates, even if such biases may be present in the to-be-fused rankings. Prior work fairly aggregating rankings is limited to ensuring proportional (not addressing equality) fairness when combining ranked lists containing the same candidate items. Yet, real-world fusion tasks often combine rankings of varying candidate sets, may also contain relevance scores, or are better suited to equal representation. To address fairness in these settings, we present a new plug-and-play fairness-aware fusion strategy: WISE fusion. WISE works in fusion settings where we have closed-box access to a score-powered rank fusion (SRF) method, making it possible to fairness-enhance existing fusion pipelines with little added cost. WISE uses existing evaluations of candidates from an as-is SRF method to achieve proportional or equal rank fairness in the final fused ranking. Our experimental study demonstrates that WISE beats the fairness and utility performance of state-of-the-art methods applied to these new fair rank fusion settings.
Trajectory similarity computation is a fundamental problem in various applications (e.g., transportation optimization, behavioral study). Recent researches learn trajectory representations instead of point matching to realize more accurate and efficient trajectory similarity computation. However, these methods can still not be scaled to large datasets due to high computational cost. In this paper, we propose a novel hash learning method to encode the trajectories into binary hash codes and compute trajectory similarities by Hamming distances which is much more efficient. To the best of our knowledge, this is the first work to conduct hash learning for trajectory similarity computation. Furthermore, unlike the Word2Vec model based on random walk strategy, we utilize hypergraph neural networks for the first time to learn the representations for the grids by constructing the hyperedges according to the real-life trajectories, resulting in more representative grid embeddings. In addition, we design a residual network into the multi-layer GRU to learn more discriminative trajectory representations. The proposed <u>H</u>ypergraph <u>H</u>ash <u>L</u>earning for <u>T</u>rajectory similarity commutation is an end-to-end framework and named HHL-Traj. Experimental results on two real-world trajectory datasets (i.e., Porto and Beijing) demonstrate that the proposed framework achieves up to 6.23% and 15.42% accuracy gains compared with state-of-the-art baselines in unhashed and hashed cases, respectively. The efficiency of trajectory similarity computation based on hash codes is also verified. Our code is available at https://github.com/caoyuan57/HHL-Traj.
Stock price prediction has been a challenging problem due to non-stationary dynamics and complex market dependencies. Existing work has two limitations: 1. Previous studies have underestimated the importance of market trends, relying solely on stock data to learn patterns and capture market regularities implicitly. However, due to random stock fluctuations and trading noise caused by market sentiment, it is difficult to learn underlying market trends, resulting in poor model performance. 2. Prior research has predominantly concentrated on time-aligned feature correlations, with limited exploration of cross-time stock correlations. To address these issues, we propose a novel framework, MATCC (Market Trend and Cross-time Correlation model). It explicitly extracts market trends as guiding information, decomposes stock data into trend and fluctuation components, and employs a carefully designed structure for mining cross-time correlation. Extensive experiments demonstrate that MATCC significantly outperforms previous works in both ranking and portfolio-based metrics. Additionally, we illustrate the influence of trends and correlations on stock prediction through visualization. We publish our code at https://github.com/caozhiy/MATCC.
The rapid spread of fake news on social media has caused great harm to society in recent years, which raises the detection of fake news as an urgent task. Recent methods utilize the interactions among different entities such as authors, subjects, and news articles to model news propagation as a static heterogeneous information network (HIN). However, this is suboptimal since fake news emerges dynamically, and the latent chronological interactions between news in HIN are essential signals for fake news detection. To this end, we model the dynamics of news and associated entities as a News-Driven Dynamic Heterogeneous Information Network (News-DyHIN), where the temporal relationships among news articles are well captured with meta-path based temporal neighbors. With the support of News-DyHIN, we propose a novel fake news detection framework, named <u> D </u>ynam<u> i </u>c <u> H </u>ierarchical <u> A </u>ttention <u> N </u>etwork (DiHAN), which learns news representations via a hierarchical attention mechanism to fuse temporal interactions among news articles. In particular, DiHAN first employs a temporal node level attention to learn the temporal information from meta-path based news neighbors through the modeled News-DyHIN. Then, a semantic attention layer is adopted to fuse different types of meta-path based temporal information for news representation learning. Extensive evaluations conducted on two public real-world datasets demonstrate that our proposed DiHAN achieves significant improvements over established baseline models.
Community search is a personalized community discovery problem designed to identify densely connected subgraphs containing the query node. Recently, community search in heterogeneous information networks (HINs) has received considerable attention. Existing methods typically focus on modeling relationships in HINs through predefined meta-paths or user-specified relational constraints. However, metapath-based methods are primarily designed to identify single-type communities with nodes of the same type rather than multi-type communities involving nodes of different types. Constraint-based methods require users to have a good understanding of community patterns to define a suitable set of relational constraints, which increases the burden on users. In this paper, we propose FCS-HGNN, a novel method for flexibly identifying both single-type and multi-type communities in HINs. Specifically, FCS-HGNN extracts complementary information from different views and dynamically considers the contribution of each relation instead of treating them equally, thereby capturing more fine-grained heterogeneous information. Furthermore, to improve efficiency on large-scale graphs, we further propose LS-FCS-HGNN, which incorporates i) the neighbor sampling strategy to improve training efficiency, and ii) the depth-based heuristic search strategy to improve query efficiency. We conducted extensive experiments to demonstrate the superiority of our proposed methods over state-of-the-art methods, achieving average improvements of 14.3% and 11.1% on single-type and multi-type communities, respectively.
The performance of modern database management systems highly relies on hundreds of adjustable knobs. Traditionally, these knobs are manually adjusted by database administrators, a process that is both inefficient and ineffective for tuning large-scale databases in cloud environments. Recent research has explored the use of machine learning techniques to enable the automatic tuning of database configurations. Although most existing learning-based methods achieve satisfactory results on static workloads, they often experience performance degradation and low sampling efficiency in real-world environments. According to our study, this is primarily due to a lack of safety guarantees during the configuration sampling process. To address the aforementioned issues, we propose SafeTune, an online tuning system that adapts to dynamic workloads. Our core idea is to filter out a large number of configurations with potential risks during the configuration sampling process. We employ a two-stage filtering approach: The first stage utilizes a semi-supervised outlier ensemble with feature learning to achieve high-quality feature representation. The second stage employs a ranking-based classifier to refine the filtering process. In addition, to alleviate the cold-start problem, we leverage the historical tuning experience to provide high-quality initial samples during the initialization phase. We conducted comprehensive evaluations on static and dynamic workloads. In comparison to offline baseline methods, SafeTune reduces 95.6%-98.6% unsafe configuration suggestions. In contrast with state-of-the-art methods, SafeTune has improved cumulative performance by 10.5%-46.6% and tuning speed by 15.1%-35.4%.
Message passing (MP) is a popular paradigm for designing graph neural networks (GNNs), which iteratively aggregates neighbor information and updates node embeddings. However, this paradigm suffers from several issues: First, long-range information struggles to be fully utilized, known as over-squashing. Second, excessive MP layers lead to indistinguishable representations, referred to as over-smoothing. Finally, vanilla MPNNs fail to meet the ability of training in heterophilic graphs. In this paper, we provide a unified insight into these defects: node embeddings are sent to neighbors at a constant "pace" and are aggregated immediately. Such synchronicity causes embeddings closer to the output to be more important, i.e. local priority, manifesting the aforementioned issues. Based on this, Asyn-MPNN, an asynchronous framework that customizes the speed of information aggregation, is proposed, which can unify many popular GNNs. We further propose the automated asynchronous (a Asyn) layer, which achieves effects similar to Asyn-MPNN but without introducing extra hyperparameters and can be integrated into any GNN. aAsyn-MPNN validates its performance through extensive experiments on both graph-level and node-level tasks and achieves leading results on tasks from long-range graph benchmark.
Sequential recommendation approaches predict the next items (targets) by analyzing prefix subsequences. These methods primarily model the correlations between prefixes and targets but often neglect the inherent correlations among prefixes and items. In this paper, we propose a Prefix-Target Graph-based Sequential Recommendation Approach (PTSR), which constructs a prefix-target graph (PTG) to collect observed correlations among prefixes and targets. It utilizes a graph neural network to model these inherent correlations, thus improving the item representations used in the predictive model. Specifically, prefixes linked to the same target reflect similar intents, while targets linked to the same prefix indicate available choices. This allows the graph neural network to effectively capture high-level correlations among prefixes and items, enhancing recommendation accuracy. We conduct extensive experiments on four real-world datasets to demonstrate the superiority of PTSR compared to state-of-the-art (SOTA) sequential recommendation methods. The source code of the PTSR is available at https://github.com/TosakRin/PTSR.
Sequential recommendation has been receiving increasing attention from researchers. Existing sequential recommendation models leverage deep learning models to capture sequential features. However, these methods ignore confounders in the recommendation process, which can lead the model to learn incorrect correlations and fail to accurately capture users' true preferences. Moreover, these methods rely on extensive interaction sequences, but sequential data often suffers from sparsity issues. To address these limitations, this paper proposes a <u> P </u>reference-<u> a </u>ware <u> C </u>ausal <u> I </u>ntervention and Counter<u> f </u>a<u> c </u>tual Data Augmentation ( Pacific ) framework to enhance sequential recommendation. Initially, we model the causal graph of sequential recommendation and categorize user preferences into global long-term preferences, local long-term preferences, and short-term preferences. Then, we introduce the front-door criterion to eliminate the interference of confounders and design different self-attention mechanisms to estimate the causal effects, aiming to capture users' true preferences. In addition, based on counterfactual thinking, we design a counterfactual data augmentation module to generate enriched sequences. Experimental results on four real-world datasets demonstrate the superiority of our proposed approach over state-of-the-art sequential recommendation methods.
Large language models have been flourishing in the natural language processing (NLP) domain, and their potential for recommendation has been paid much attention to. Despite the intelligence shown by the recommendation-oriented finetuned models, LLMs struggle to fully understand the user behavior patterns due to their innate weakness in interpreting numerical features and the overhead for long context, where the temporal relations among user behaviors, subtle quantitative signals among different ratings, and various side features of items are not well explored. Existing works only fine-tune a sole LLM on given text data without introducing that important information to it, leaving these problems unsolved. In this paper, we propose ELCoRec to Enhance Language understanding with Co-Propagation of numerical and categorical features for Recommendation. Concretely, we propose to inject the preference understanding capability into LLM via a GAT expert model where the user preference is better encoded by parallelly propagating the temporal relations, and rating signals as well as various side information of historical items. The parallel propagation mechanism could stabilize heterogeneous features and offer an informative user preference encoding, which is then injected into the language models via soft prompting at the cost of a single token embedding. To further obtain the user's recent interests, we proposed a novel Recent interaction Augmented Prompt (RAP) template. Experiment results over three datasets against strong baselines validate the effectiveness of ELCoRec.
Deep learning has achieved remarkable success in graph-related tasks, yet this accomplishment heavily relies on large-scale high-quality annotated datasets. However, acquiring such datasets can be cost-prohibitive, leading to the practical use of labels obtained from economically efficient sources such as web searches and user tags. Unfortunately, these labels often come with noise, compromising the generalization performance of deep networks. To tackle this challenge and enhance the robustness of deep learning models against label noise in graph-based tasks, we propose a method called ERASE (Error-Resilient representation learning on graphs for lAbel noiSe tolerancE). The core idea of ERASE is to learn representations with error tolerance by maximizing coding rate reduction. To the best of our knowledge, it is the first time that the error-resilient mechanism is introduced into graph representation learning against label noise. Particularly, we also propose a decoupled label propagation method to estimate coding rate reduction. Before training, noisy labels are pre-corrected through structural denoising. During training, ERASE combines prototype pseudo-labels with propagated denoised labels and updates representations with error resilience, which significantly improves the generalization performance in node classification. The proposed method allows us to more effectively withstand errors caused by mislabeled nodes, thereby strengthening the robustness of deep networks in handling noisy graph data. Extensive experimental results show the effectiveness of the proposed method. Codes: https://github.com/eraseai/erase.
Traffic accidents pose a significant risk to human health and property safety. Therefore, to prevent traffic accidents, predicting their risks has garnered growing interest. We argue that a desired prediction solution should demonstrate resilience to the complexity of traffic accidents. In particular, it should adequately consider the regional background, accurately capture both spatial proximity and semantic similarity, and effectively address the sparsity of traffic accidents. However, these factors are often overlooked or difficult to incorporate. In this paper, we propose a novel multi-granularity hierarchical spatio-temporal network. Initially, we innovate by incorporating remote sensing data, facilitating the creation of hierarchical multi-granularity structure and the comprehension of regional background. We construct multiple high-level risk prediction tasks to enhance model's ability to cope with sparsity. Subsequently, to capture both spatial proximity and semantic similarity, region feature and multi-view graph undergo encoding processes to distill effective representations. Additionally, we propose message passing and adaptive temporal attention module that bridges different granularities and dynamically captures time correlations inherent in traffic accident patterns. At last, a multivariate hierarchical loss function is devised considering the complexity of the prediction purpose. Extensive experiments on two real datasets verify the superiority of our model against the state-of-the-art methods.
Image inpainting, the task of reconstructing missing segments in corrupted images using available data, faces challenges in ensuring consistency and fidelity, especially under information-scarce conditions. Traditional evaluation methods, heavily dependent on the existence of unmasked reference images, inherently favor certain inpainting outcomes, introducing biases. Addressing this issue, we introduce an innovative evaluation paradigm that utilizes a self-supervised metric based on multiple re-inpainting passes. This approach, diverging from conventional reliance on direct comparisons in pixel or feature space with original images, emphasizes the principle of self-consistency to enable the exploration of various viable inpainting solutions, effectively reducing biases. Our extensive experiments across numerous benchmarks validate the alignment of our evaluation method with human judgment.
Discrete-Time Dynamic Graphs (DTDGs), which are prevalent in real-world implementations and notable for their ease of data acquisition, have garnered considerable attention from both academic researchers and industry practitioners. The representation learning of DTDGs has been extensively applied to model the dynamics of temporally changing entities and their evolving connections. Currently, DTDG representation learning predominantly relies on GNN+RNN architectures, which manifest the inherent limitations of both Graph Neural Networks (GNNs) and Recurrent Neural Networks (RNNs). GNNs suffer from the over-smoothing issue as the models architecture goes deeper, while RNNs struggle to capture long-term dependencies effectively. GNN+RNN architectures also grapple with scaling to large graph sizes and long sequences. Additionally, these methods often compute node representations separately and focus solely on individual node characteristics, thereby overlooking the behavior intersections between the two nodes whose link is being predicted, such as instances where the two nodes appear together in the same context or share common neighbors.
This paper introduces a novel representation learning method øurs~for DTDGs, pivoting from the traditional GNN+RNN framework to a Transformer-based architecture. Our approach exploits the attention mechanism to concurrently process topological information within the graph at each timestamp and temporal dynamics of graphs along the timestamps, circumventing the aforementioned fundamental weakness of both GNNs and RNNs. Moreover, we enhance the model's expressive capability by incorporating the intersection relationships among nodes and integrating a multi-patching module. Extensive experiments conducted on six public dynamic graph benchmark datasets confirm our model's efficacy, achieving the SOTA performance.
Social recommendation systems leverage the social relations among users to deal with the inherent cold-start problem in user-item interactions. However, previous models only treat the social graph as the static auxiliary to the user-item interaction graph, rather than dig out the hidden essentials and optimize them for better recommendations. Thus, the potential of social influence is still under-explored. In this paper, we will fill this gap by proposing a novel model for social influence learning to derive the essential influence patterns within the user relationships. Our model views the social influence from the perspectives of (1) the diversity of neighborhood's influence on the users, (2) the disentanglement of neighborhood's influence on the users, and (3) the exploration of underlying implicit social influence. To this end, we first employ a novel layerwise graph-enhanced variational autoencoder for the reconstruction of neighborhoods' representations, which aims to learn the pattern of social influence as well as simulate the social profile of each user for overcoming the sparsity issue in social relation data. Meanwhile, we introduce a layerwise graph attentive network for capturing the most influential scope of neighborhood. Finally, we adopt a dual sampling process to generate new social relations for enhancing the social recommendation. Extensive experiments have been conducted on three widely-used benchmark datasets, verifying the superiority of our proposed model compared with the representative approaches.
Cross-domain remote sensing image retrieval has been a hotspot in the past few years. Most of the existing methods focus on combining semantic learning with domain adaptation on well-labeled source domain and unlabeled target domain. However, they face two serious challenges. (1) They cannot deal with practical scenarios where the source domain lacks sufficient label supervision. (2) They suffer from severe performance degradation when the data distribution between the source domain and target domain becomes highly inconsistent. To address these challenges, we propose <u> D </u>omain <u> A </u>lignment with <u> L </u>arge <u> V </u>ision-language models for cross-domain remote sensing image retrieval (termed as DALV). First, we design a dual-modality prototype guided pseudo-labeling mechanism, which leverages the pre-trained large vision-language model (i.e., CLIP) to assign pseudo-labels for all unlabeled source domain images and target domain images. Second, we compute the confidence scores for these pseudo-labels to distinguish their reliability. Next, we devise a loss reweighting strategy, which incorporates the confidence scores as weight values into the contrastive loss to mitigate the impact of noisy pseudo-labels. Finally, the low-rank adaptation fine-tuning means is adapted to update our model and achieve domain alignment to obtain class discriminative features. Extensive experiments on 12 cross-domain remote sensing image retrieval tasks show that our proposed DALV outperforms the state-of-the-art approaches. The source code is available at https://github.com/ptyy01/DALV.
Hypergraphs provide a more flexible representation for group interactions in complex systems compared to ordinary graphs, where each hyperedge can connect any number of nodes. In practice, data modeled as hypergraphs often contain hyperedge importance values, which indicate the influence or popularity of the group collaborations. For example, in a co-authorship hypergraph, a paper (hyperedge) is co-authored by multiple authors (nodes). The number of citations a paper receives can be regarded as the importance value of its corresponding hyperedge, reflecting its academic influence and significance.
In this work, we introduce hyperedge importance estimation as a new problem in hypergraph learning. The flexibility of hyperedges enables hypergraph modeling to capture high-order relationships between entities, which has attracted widespread attention. The importance value of hyperedge has also been proven to be highly valuable in many applications. To address this problem, we propose the Identity-aware Hypergraph Attention Network (ID-HAN) for efficient hyperedge importance estimation. ID-HAN~employs a special attention mechanism to model the importance contribution of each node within the hyperedge, which injects identity information according to the hyperedge-dependent node labels. Additionally, a centrality-aware positional encoding module generates learnable positional embeddings of nodes and hyperedges based on the relative order of degree centrality and identity information, thereby enhancing the consistency between message passing and importance propagation. Extensive experiments on four real-world datasets demonstrate that ID-HAN~significantly outperforms the state-of-the-art hypergraph neural networks on the hyperedge importance estimation task.
The application of skyline queries on outsourced databases significantly aids online analysis, yet efficiently handling encrypted queries remains a formidable obstacle. Moreover, query outcomes are vulnerable to potential malicious cloud services. To circumvent these limitations, this work presents the <u>H</u>onest-<u>M</u>ajority and <u>M</u>aliciously <u>S</u>kyline <u>Q</u>uery scheme (HMMSQ), which facilitates efficient skyline queries while safeguarding the privacy of datasets, queries, and skylines, as well as detecting malevolent activities. The core of HMMSQ is an optimized skyline diagram constructed by a novel skyline region-splitting algorithm for accurate skyline queries. Furthermore, it mitigates the frequency of dataset accesses by leveraging a multi-path R-tree for secure skyline retrieval. Notably, the majority of malicious behavior detection is focused on the servers, thereby minimizing user authentication overhead. The complexity and security are thoroughly analyzed, and experimental evaluations on various datasets demonstrate its efficiency and practicality in terms of computational cost and communication overhead. Remarkably, HMMSQ outperforms existing methods in query latency, achieving up to an order of magnitude improvement.
Artificial intelligence has been applied in various aspects of online education to facilitate teaching and learning. However, few approaches have been made towards a complete AI-powered tutoring system. In this work, we explore the development of a full-fledged intelligent tutoring system based on large language models (LLMs). The proposed system ChatTutor, powered by state-of-the-art LLMs, is equipped with automatic course planning and adjusting, informative instruction, and adaptive quiz offering and evaluation. ChatTutor is decomposed into three inter-connected core processes: interaction, reflection, and reaction. Each process is implemented by chaining LLM-powered tools along with dynamically updated memory modules. To demonstrate the mechanism of each working module and the benefits of structured memory control and adaptive reflection, we conduct a wide range of analysis based on statistical results and user study. The analysis shows the designed processes boost system consistency and stability under long-term interaction and intentional disruptions, with up to 5% and 20% increase in performance respectively. Meanwhile, we also compare the system with scripts from real-world online learning platform and discuss the potential issues unique to LLM-based systems.
Scene Text Recognition (STR) is an important and challenging upstream task for building structured information databases, which involves recognizing text within images of natural scenes. Although current state-of-the-art (SOTA) models for STR exhibit high performance, they typically suffer from low inference efficiency due to their reliance on hybrid architectures comprised of visual encoders and sequence decoders. In this work, we propose a VIsion Permutable extractor for fast and efficient Scene Text Recognition (SVIPTR), which achieves an impressive balance between high performance and rapid inference speeds in the domain of STR. Specifically, SVIPTR leverages a visual-semantic extractor with a pyramid structure, characterized by the Permutation and combination of local and global self-attention layers. This design results in a lightweight and efficient model and its inference is insensitive to input length. Extensive experimental results on various standard datasets for both Chinese and English scene text recognition validate the superiority of SVIPTR. Notably, the SVIPTR-T (Tiny) variant delivers highly competitive accuracy on par with other lightweight models and achieves SOTA inference speeds. Meanwhile, the SVIPTR-L (Large) attains SOTA accuracy in single-encoder-type models, while maintaining a low parameter count and favorable inference speed. Our proposed method provides a compelling solution for the STR challenge, which greatly benefits real-world applications requiring fast and efficient STR. The code is publicly available at https://github.com/cxfyxl/VIPTR.
Join order optimization is pivotal in database query optimization, seeking the most efficient join sequence to reduce execution costs. As more tables join, the complexity surges, turning it into an NP-hard problem due to the exponential growth of possible orders. Deep reinforcement learning (DRL) has recently made significant strides, outperforming traditional algorithms by treating join selection as a Markov Decision Process to devise more effective strategies.Current methods struggle with integrating query semantics and plan structures, as well as encountering issues with complex joins where bottom-up learning can lead to information loss.To tackle these issues, we present the Tree-based Selective State Space Models for Efficient Join Order Selection Learning(TESSM). This framework uses the Tree Mamba architecture to integrate join pattern graphs with execution plan nodes, enhancing long-term dependency information flow. A tiered training strategy enhances the model's training precision and speed.Our approach has proven effective, as evidenced by JOB and TPC-H benchmark tests, showing TESSM's substantial improvements in query optimization efficiency and effectiveness.
The impressive performance of large language models (LLMs) has attracted considerable attention from the academic and industrial communities. Besides how to construct and train LLMs, how to effectively evaluate and compare the capacity of LLMs has also been well recognized as an important yet difficult problem. Existing paradigms rely on either human annotators or model-based evaluators to evaluate the performance of LLMs on different tasks. However, these paradigms often suffer from high cost, low generalizability, and inherited biases in practice, which make them incapable of supporting the sustainable development of LLMs in the long term. In order to address these issues, inspired by the peer review systems widely used in the academic publication process, we propose a novel framework that can automatically evaluate LLMs through a peer-review process. Specifically, for the evaluation of a specific task, we first construct a small qualification exam to select "reviewers'' from a couple of powerful LLMs. Then, to actually evaluate the "submissions" written by different candidate LLMs, i.e., the evaluatees, we use the reviewer LLMs to rate or compare the submissions. The final ranking of evaluatee LLMs is generated based on the results provided by all reviewers. We conducted extensive experiments on both text summarization and non-factoid question-answering tasks with eleven LLMs including GPT-4. The results demonstrate the existence of biasness when evaluating using a single LLM. Also, our PRE model outperforms all the baselines, illustrating the effectiveness of the peer review mechanism.
Despite the massive attention given to time-series explanations due to their extensive applications, a notable limitation in existing approaches is their primary reliance on the time-domain. This overlooks the inherent characteristic of time-series data containing both time and frequency features. In this work, we present Spectral eXplanation (SpectralX), an XAI framework that provides time-frequency explanations for time-series black-box classifiers. This easily adaptable framework enables users to "plug-in" various perturbation-based XAI methods for any pre-trained time-series classification models to assess their impact on the explanation quality without having to modify the framework architecture. Additionally, we introduce Feature Importance Approximations (FIA), a new perturbation-based XAI method. These methods consist of feature insertion, deletion, and combination techniques to enhance computational efficiency and class-specific explanations in time-series classification tasks. We conduct extensive experiments in the generated synthetic dataset and various UCR Time-Series datasets to first compare the explanation performance of FIA and other existing perturbation-based XAI methods in both time-domain and time-frequency domain, and then show the superiority of our FIA in the time-frequency domain with the SpectralX framework. Finally, we conduct a user study to confirm the practicality of our FIA in SpectralX framework for class-specific time-frequency based time-series explanations.
Contrastive learning has been effectively utilized to enhance the training of sequential recommendation models by leveraging informative self-supervised signals. Most existing approaches generate augmented views of the same user sequence through random augmentation and subsequently maximize their agreement in the representation space. However, these methods often neglect the rationality of the augmented samples. Due to significant uncertainty, random augmentation can disrupt the semantic information and interest evolution patterns inherent in the original user sequences. Moreover, pulling semantically inconsistent sequences closer in the representation space can render the user sequence embeddings insensitive to variations in user preferences, which contradicts the primary objective of sequential recommendation. To address these limitations, we propose the Context-aware Diffusion-based Contrastive Learning for Sequential Recommendation, named CaDiRec. The core idea is to leverage context information to generate more reasonable augmented views. Specifically, CaDiRec employs a context-aware diffusion model to generate alternative items for the given positions within a sequence. These generated items are aligned with their respective context information and can effectively replace the corresponding original items, thereby generating a positive view of the original sequence. By considering two different augmentations of the same user sequence, we can construct a pair of positive samples for contrastive learning. To ensure representation cohesion, we train the entire framework in an end-to-end manner, with shared item embeddings between the diffusion model and the recommendation model. Extensive experiments on five benchmark datasets demonstrate the advantages of our proposed method over existing baselines.
As a fundamental technology in intelligent transportation systems (ITS), accurate traffic flow prediction has emerged as a critical challenge in real-time applications. How to fully utilize the traffic data, and capture the spatial temporal correlation are keys to improve the model's prediction ability. Numerous neural networks have been proposed to address this issue. However, most of these existing methods have the following two problems: 1) Lack of byroads information. Meaning that the existing methods do not consider the byroads in real-life traffic environments; 2) Lack of potential learning ability. Meaning that the existing methods suffer the non-similar forgetting and hard to gain the multi-hop correlation. To overcome these problems, we propose a novel Spatial Temporal <u>By</u>road-Aware <u>G</u>raph <u>C</u>onvolution <u>N</u>etwork (ByGCN) in this paper. ByGCN consists of byroad identification and spatial temporal learning modules. In the first module, we design spatial temporal decoupling and graph diffusion blocks to identify the byroads and reconstruct them into the flow data. In the second module, with the help of spatial temporal attention and GCN, our module can capture the complex spatial temporal correlation. Experiments on four real-world traffic datasets demonstrate that ByGCN outperforms the state-of-the-art methods.
Current adversarial defense methods for GNNs exhibit critical limitations obstructing real-world application: 1) inadequate adaptability to graph heterophily, 2) absent generalizability to early GNNs like GraphSAGE used downstream, and 3) low inference scalability unacceptable for resource-constrained scenarios. To simultaneously address these challenges, we propose the first online GNN-MLP distillation framework PROSPECT, which merges the complementary knowledge of MLP and GNN and can thus learn GNN and MLP robust against adversarial structure attacks on both homophilic and heterophilic graphs. PROSPECT integrates seamlessly into GraphSAGE and achieves inference scalability exponentially higher than conventional GNNs. To mitigate potential convergence failure caused by inductive bias conflicts between the heterogeneous MLP and GNN, we propose the Quasi-Alternating Cosine Annealing (QACA) learning rate scheduler, inspired by our convergence analysis of the involved MLP. Experiments on homophilic and heterophilic graphs demonstrate the advantages of PROSPECT over current defenses and offline GNN-MLP distillation methods in terms of adversarial robustness and clean accuracy, the inference scalability of PROSPECT orders of magnitude higher than existing defenses, and the effectiveness of QACA.
Multimodal Named Entity Recognition (MNER) aims to achieve more accurate entity recognition by incorporating image information to assist text, which is particularly significant on social media platforms. Current research disproportionately emphasizes enhancing text with images, overlooking that the core of the NER task remains textual. The modal differences between images and text inevitably introduces noise when incorporating image information. Therefore, when textual information is sufficient to independently complete the NER task, the introduction of image information is unnecessary. This paper proposes an Adaptive Logical Decision Framework (ALDF) capable of determining the sufficiency of textual information in NER tasks, deciding whether to introduce image information, avoiding unnecessary noise, and focusing more on information-scarce entities when introducing image information. Specifically, we designed a Logic Reasoning Neural Network (LRNN) that uses an evidence-theory-based method to simulate human decision-making logic and generate decision support degrees for deciding whether image information should participate in the recognition task. When incorporating image information, we utilize the generated decision support degrees to guide the multi-head self-attention mechanism, enhancing the model's focus on information-scarce entities. Additionally, we employ a modality-aware progressive training method that can use decision information in real-time during multimodal training and reduce information redundancy between modalities. Extensive experiments demonstrate that our model achieves state-of-the-art performance on popular public datasets.
Long-term time series forecasting (LTSF) has been widely applied in finance, traffic prediction, and other domains. Recently, patch-based transformers have emerged as a promising approach, segmenting data into sub-level patches that serve as input tokens. However, existing methods mostly rely on predetermined patch lengths, necessitating expert knowledge and posing challenges in capturing diverse characteristics across various scales. Moreover, time series data exhibit diverse variations and fluctuations across different temporal scales, which traditional approaches struggle to model effectively. In this paper, we propose a dynamic tokenizer with a dynamic sparse learning algorithm to capture diverse receptive fields and sparse patterns of time series data. In order to build hierarchical receptive fields, we develop a multi-scale Transformer model, coupled with multi-scale sequence extraction, capable of capturing multi-resolution features. Additionally, we introduce a group-aware rotary position encoding technique to enhance intra- and inter-group position awareness among representations across different temporal scales. Our proposed model, named DRFormer, is evaluated on various real-world datasets, and experimental results demonstrate its superiority compared to existing methods. Our code is available at: https://github.com/ruixindingECNU/DRFormer.
Cryptocurrencies are rapidly expanding and becoming vital in digital financial markets. However, the rise in cryptocurrency-related illicit activities has led to significant losses for users. To protect the security of these platforms, it is critical to identify illicit accounts effectively. Current detection methods mainly depend on feature engineering or are inadequate to leverage the complex information within cryptocurrency transaction networks, resulting in suboptimal performance. In this paper, we present DIAM, an effective method for detecting illicit accounts in cryptocurrency transaction networks modeled by directed multi-graphs with attributed edges. DIAM first features an Edge2Seq module that captures intrinsic transaction patterns from parallel edges by considering edge attributes and their directed sequences, to generate effective node representations. Then in DIAM, we design a multigraph Discrepancy (MGD) module with a tailored message passing mechanism to capture the discrepant features between normal and illicit nodes over the multigraph topology, assisted by an attention mechanism. DIAM integrates these techniques for end-to-end training to detect illicit accounts from legitimate ones. Extensive experiments, comparing against 15 existing solutions on 4 large cryptocurrency datasets of Bitcoin and Ethereum, demonstrate that DIAM consistently outperforms others in accurately identifying illicit accounts. For example, on a Bitcoin dataset with 20 million nodes and 203 million edges, DIAM attains an F1 score of 96.55%, markedly surpassing the runner-up's score of 83.92%. The code is available at https://github.com/TommyDzh/DIAM.
Graph-level representation learning is important in a wide range of applications. Existing graph-level models are generally built on i.i.d. assumption for both training and testing graphs. However, in an open world, models can encounter out-of-distribution (OOD) testing graphs that are from different distributions unknown during training. A trustworthy model should be able to detect OOD graphs to avoid unreliable predictions, while producing accurate in-distribution (ID) predictions. To achieve this, we present SGOOD, a novel graph-level OOD detection framework. We find that substructure differences commonly exist between ID and OOD graphs, and design SGOOD with a series of techniques to encode task-agnostic substructures for effective OOD detection. Specifically, we build a super graph of substructures for every graph, and develop a two-level graph encoding pipeline that works on both original graphs and super graphs to obtain substructure-enhanced graph representations. We then devise substructure-preserving graph augmentation techniques to further capture more substructure semantics of ID graphs. Extensive experiments against 11 competitors on numerous graph datasets demonstrate the superiority of SGOOD, often surpassing existing methods by a significant margin. The code is available at https://github.com/TommyDzh/SGOOD.
Recently, the issue of adversarial robustness in the time series domain has garnered significant attention. However, the available defense mechanisms remain limited, with adversarial training being the predominant approach, though it does not provide theoretical guarantees. Randomized Smoothing has emerged as a standout method due to its ability to certify a provable lower bound on robustness radius under lp-ball attacks. Recognizing its success, research in the time series domain has started focusing on these aspects. However, existing research predominantly focuses on time series forecasting, or under the non-lp robustness in statistic feature augmentation for time series classification (TSC). Our review found that Randomized Smoothing performs modestly in TSC, struggling to provide effective assurances on datasets with poor robustness. Therefore, we propose a self-ensemble method to enhance the lower bound of the probability confidence of predicted labels by reducing the variance of classification margins, thereby certifying a larger radius. This approach also addresses the computational overhead issue of Deep Ensemble (DE) while remaining competitive and, in some cases, outperforming it in terms of robustness. Both theoretical analysis and experimental results validate the effectiveness of our method, demonstrating superior performance in robustness testing compared to baseline approaches.
Zero-Shot Hashing (ZSH) has aroused significant attention due to its efficiency and generalizability in multi-modal retrieval scenarios, which aims to encode semantic information into hash codes without needing unseen labeled training samples. In addition to commonly used visual images as visual semantics and class labels as global semantics, the corresponding attribute descriptions contain critical local semantics with detailed information. However, most existing methods focus on leveraging the extracted attribute numerical values, without exploring the textual semantics in attribute descriptions. To bridge this gap, in this paper, we propose Prompt-based zero-shot hashing via vIsual and teXtual sEmantic aLignment, namely PIXEL. Concretely, we design the attribute prompt template depending on attribute descriptions to make the model capture the corresponding local semantics. Then, achieving the textual embedding and visual embedding, we proposed an alignment module to model the intra- and inter-class contrastive distances. In addition, the attribute-wise constraint and class-wise constraint are utilized to collaboratively learn the hash code, image representation, and visual attributes more effectively. Finally, extensive experimental results demonstrate the superiority of PIXEL.
Zero-Shot Relational Learning (ZSRL), strives to predict relations that have not been observed during training, presenting a considerable challenge in terms of model generalization. Existing ZSRL methods usually utilize the prior knowledge of labels (e.g., text description, ontological schema) to enable knowledge transfer by learned features. Nonetheless, these methods remain limited to calculating the surface features exhibited by relations, failing to fully explore their underlying driving factors. This leads to insufficient discrimination between the shared and distinctive inherent components among relations, which consequently impedes the cognitive understanding required for advanced reasoning. In our study, we aim to identify and utilize shared factors that widely exist in the prior knowledge of classes to learn enhanced semantic representations via shared factors composition, and develop our Factor-based ZSRL framework (FZR) with Generative Adversarial Networks (GANs) to bridge inequality between seen and unseen classes. FZR is designed to restructure the semantic space in such a way that it captures the essence of relation formation, thereby facilitating superior knowledge transfer in zero-shot scenarios. We conduct extensive experiments and evaluate our model on real-world datasets, and the results clearly demonstrate the effectiveness of the proposed model in zero-shot relational learning tasks.
Deep entity resolution (ER) identifies matching entities across data sources using techniques based on deep learning. It involves two steps: a blocker for identifying the potential matches to generate the candidate pairs, and a matcher for accurately distinguishing the matches and non-matches among these candidate pairs. Recent deep ER approaches utilize pretrained language models (PLMs) to extract similarity features for blocking and matching, achieving state-of-the-art performance. However, they often fail to balance the consensus and discrepancy between the blocker and matcher, emphasizing the consensus while neglecting the discrepancy. This paper proposes MutualER, a deep entity resolution framework that integrates and jointly trains the blocker and matcher, balancing both the consensus and discrepancy between them. Specifically, we firstly introduce a lightweight PLM in siamese structure for the blocker and a heavier PLM in cross structure or an autoregressive large language model (LLM) for the matcher. Two optimization techniques named Mutual Sample Selection (MSS) and Similarity Knowledge Transferring (SKT) are designed to jointly train the blocker and matcher. MSS enables the blocker and matcher to mutually select the customized training samples for each other to maintain the discrepancy, while SKT allows them to share the similarity knowledge for improving their blocking and matching capabilities respectively to maintain the consensus. Extensive experiments on five datasets demonstrate that MutualER significantly outperforms existing PLM-based and LLM-based approaches, achieving leading performance in both effectiveness and efficiency.
Time Series Segmentation (TSS) is a data mining task widely used in many applications to generate a set of change points for a time series. Current TSS performance analyses focus on accuracy and, therefore, fail to fully evaluate the reliability and originality of a segmentation. We investigate using uncertainty quantification (UQ) to fully evaluate TSS performance. We propose UQ-TSS, a framework to quantify uncertainties surrounding TSS. UQ-TSS captures uncertainties from different sources in an integrative manner. It incorporates a novel TS augmentation algorithm to address inherent uncertainty in the data. It uses ensemble learning in a novel way to create samples and estimate the probability distributions of changepoint presence and locations. We demonstrate the ability of UQ-TSS to guide hyperparameter selection, refine segmentations, and determine an algorithm's suitability for segmenting without the need for ground truth. We validate these claims through extensive experimentation using several well-established TSS algorithms and datasets.
Predicting stock price movements is a high-stakes task that demands explainability for human decision-makers. A key shortcoming in current methods is treating sub-predictions independently, without learning from accumulated experiences. We propose a novel triplet network for contrastive learning to enhance the explainability of stock movement prediction by considering instances of "integrated textual information and quantitative indicators". We refer to the target past-l-day tweet-price time series as the "anchor instance". Each anchor instance is paired with a "positive instance" characterized by highly correlated return trends yet significant differences across the entire feature space, and a "negative instance" that exhibits similar return trends along with high proximity in the feature space. The model is designed with the objective of (1) minimizing the cross entropy loss between input logits and target, (2) minimizing the distance between the anchor instances and positive instances, and (3) maximizing the distance between the anchor instances and negative instances. Our framework's effectiveness is demonstrated through extensive testing, showing superior performance on stock prediction benchmarks.
Spatial transcriptomics has transformed genomic research by measuring spatially resolved gene expressions, allowing us to investigate how cells adapt to their microenvironment via modulating their expressed genes. This essential process usually starts from cell-cell communication (CCC) via ligand-receptor (LR) interaction, leading to regulatory changes within the receiver cell. However, few methods were developed to connect them to provide biological insights into intercellular regulation. To fill this gap, we propose iMiracle, an iterative multi-view graph neural network that models each cell's intercellular regulation with three key features. Firstly, iMiracle integrates inter- and intra-cellular networks to jointly estimate cell-type- and micro-environment-driven gene expressions. Optionally, it allows prior knowledge of intra-cellular networks as pre-structured masks to maintain biological relevance. Secondly, iMiracle employs iterative learning to overcome the sparsity of spatial transcriptomic data and gradually fill in the missing edges in the CCC network. Thirdly, iMiracle infers a cell-specific ligand-gene regulatory score based on the contributions of different LR pairs to interpret inter-cellular regulation. We applied iMiracle to nine simulated and eight real datasets from three sequencing platforms and demonstrated that iMiracle consistently outperformed ten methods in gene expression imputation and four methods in regulatory score inference. Lastly, we developed iMiracle as an open-source software and anticipate that it can be a powerful tool in decoding the complexities of inter-cellular transcriptional regulation.
Training convolutional neural networks (CNNs) demands huge GPU memory consumption and training time, leading to increased carbon emissions, and impacting sustainability. In this paper, we propose HotConv, a low GPU memory and low carbon footprint learning strategy for training the class of 1D CNNs that have a temporal max-pooling layer. Such CNNs are widely used in various domains for learning large-sized inputs, including genomics and malware detection. HotConv reduces the GPU memory usage of such CNNs by harnessing the sparsity of relevant activations and gradients at the temporal max-pooling layer, which produces the same model as the full computation of activations and gradients, without trading-off model performance. Evaluations using the public benchmark BODMAS and VirusTotal datasets for malware detection with HotConv applied to the public MalConv network architecture show that the carbon footprint reduction using HotConv is superior to existing approaches. For instance, HotConv uses only 1/22 of the GPU memory used by MalConv2 - the memory-efficient variant of MalConv, while also consuming less training time than MalConv2. This is equivalent to reducing the carbon footprint up to 1/4 of that of MalConv2 without compromising performance.
In the pursuit of intersectional group fairness in machine learning models, significant attention has been directed towards fair representation learning methods. These methods aim to mitigate bias in training data by encoding data effectively while removing sensitive attribute information. However, existing fair representation learning methods often assume that decoupling sensitive attribute information from the latent representation will automatically lead to fairness on any downstream tasks learnt on the non-sensitive subspace of the latent representation. Nonetheless, biases can persist even when using representations devoid of sensitive attribute information. This is due to the learning algorithm's influence during downstream task training. In this paper, we propose a method dubbed FairReg which integrates fairness regularization with fair representation learning. This unified approach creates a more comprehensive and robust framework for ensuring intersectional group fairness in machine learning models. Empirical evaluations conducted on two real-world depression prediction datasets demonstrate the effectiveness of our method in improving intersectional group fairness compared to existing approaches.
Path integration methods generate attributions by integrating along a trajectory from a baseline to the input. These techniques have demonstrated considerable effectiveness in the field of explainability research. While multiple types of baselines for the path integration process have been explored in the literature, there is no consensus on the ultimate one. This work examines the performance of different baseline distributions on explainability metrics and proposes a probabilistic path integration approach where the baseline distribution is modeled as a mixture of distributions, learned for each combination of model architecture and explanation metric. Extensive evaluations on various model architectures show that our method outperforms state-of-the-art explanation methods across multiple metrics.
Learning-based models have shown promise in addressing query optimization challenges in the database field, where the learned cost model plays a central role. While these models outperform traditional optimizers on static datasets, their resilience and reliability in real-world applications remain a concern, limiting their widespread adoption. In this paper, we take a step towards a practical cost estimation model, named Tosure, which can quantify the uncer<u>T</u> ainty for c<u>o</u>st estimation and generalizes to un<u>s</u>een databases acc<u>ur</u>ately and <u>e</u>fficiently. It consists primarily of two modules: a Cross-Database Representation (CDR) module and a Cost Estimation with Uncertainty (CEU) module. The CDR module captures the transferable features by focusing the minimal set based on deep-learning network, thereby enhancing the model's generalization capabilities. The CEU module introduces a novel Neural Network Gaussian Process (NNGP) to quantify the uncertainty in cost estimation, ensuring more robust estimations with an upper bound. To improve the model's performance, we perform pre-training on diverse large-scale datasets. Furthermore, we implement the model and integrate it with traditional query optimizer to validate its usability and effectiveness in real-world scenarios. Extensive experimentation demonstrates that Tosure outperforms state-of-the-art methods, achieving a 20% improvement in cost estimation accuracy and twice of the robustness.
Multimodal recognition can achieve enhanced performance by leveraging the complementary information from different modalities. However, in real-world scenarios, multimodal samples often express discordant semantic meanings across modalities, lacking evident complementary information. Unlike humans who can easily understand the intrinsic semantic information of these semantically discordant samples, existing multimodal recognition models show poor performance on them. With the motivation of improving the robustness of multimodal recognition models in practical scenarios, this work poses a new challenge in multimodal recognition, which is coined as Semantic Discordance Understanding. Unlike existing works only focusing on detecting semantically discordant samples as noisy data, this new challenge requires deep models to follow humans' ability in understanding the inherent semantic meanings of semantically discordant samples. To address this challenge, we further propose the Progressive Multimodal Pivot Learning (PMPL) approach by introducing a learnable pivot memory to explore the inherent semantics meaning hidden under discordant modalities. To this end, our approach inserts Pivot Memory Learning (PML) modules into multiple layers of unimodal foundation models to progressively trade-off the conflict information across modalities. By introducing the multimodal pivot learning paradigm for multimodal recognition, the proposed PMPL approach can alleviate the negative effect of semantic discordance caused by the cross-modal information exchange mechanism of existing multimodal recognition models. Experiments on different benchmarks validate the superiority of our approach. Code is available at https://github.com/tiggers23/PMPL.
Missing values and unreleased figures are common but highly important for backtesting and real-time analysis in the financial industry, yet underexploited in the existing literature. In this paper, we focus on the issue of empirical asset pricing, where the cross-section of future asset returns is a function of lagged firm characteristics that vary in time frequencies and missing ratios. Most of the existing imputation methods cannot fully capture the complex and evolving spatio-temporal relations among firm-level characteristics. In particular, these methods fail to explicitly consider the spatial relations and feature structure in the stock network where we have to process granular data of thousands of stocks and hundreds of characteristics for each stock. To address these challenges, we propose a spatio-temporal diffusion model (STDM) that gradually recovers the masked financial data conditioning on high-dimensional stock-and-characteristics historical data. We propose characteristic-specific projection to construct characteristic-level features at both ends of the STDM, meanwhile maintaining firm-level features in the middle of the STDM to largely reduce the computational memory. Moreover, along with the temporal attention, we design a spatial graph convolutional network, making it computationally efficient and effective to learn time-varying spatio-temporal interdependence across firms. We further employ an implicit sampler that greatly accelerates the inference procedure so that the STDM is able to produce high-quality point and density estimates of missing and real-time firm characteristics within a few steps. We evaluate our model on the most comprehensive open-source dataset 'OSAP' and generate state-of-the-art performance in extensive experiments.
While new and effective methods for anomaly detection are frequently introduced, many studies prioritize the detection task without considering the need for explainability. Yet, in real-world applications, anomaly explanation, which aims to provide explanation of why specific data instances are identified as anomalies, is an equally important task. In this work, we present a novel approach for efficient and accurate model-agnostic anomaly explanation for tabular data using Predicate-based Association Rules (PARs). PARs can provide intuitive explanations not only about which features of the anomaly instance are abnormal, but also the reasons behind their abnormality. Our user study indicates that the anomaly explanation form of PARs is better comprehended and preferred by regular users of anomaly detection systems as compared to existing model-agnostic explanation options. Furthermore, we conduct extensive experiments on various benchmark datasets, demonstrating that PARs compare favorably to state-of-the-art model-agnostic methods in terms of computing efficiency and explanation accuracy on anomaly explanation tasks. The code for our experiments is available at https://github.com/cfeng783/PARs.
With the rapid development of the mobile internet, there is an increasing demand for quick access to effective data. Consequently, more research is focusing on data processing and optimization of recommendation systems in edge computing environments. However, in traditional edge computing environments, recommendation systems typically depend on frequent data query interactions among all edge servers to obtain results, which increases time delays. This issue is further exacerbated by the need to process large amounts of data within edge storage systems. To address this challenge, we propose an efficient recommendation method based on data allocation. Specifically, during the data allocation process, we first extract similar features of users, ensuring that the characteristics of some user data align as closely as possible with the overall user feature distribution. Then, using an improved consistent hashing algorithm, we achieve a uniform data layout, allowing the recommendation system to efficiently and accurately provide recommendations by querying data from the nearest edge server to the user. Finally, extensive experiments on real datasets show that our method significantly reduces time delays and improves the accuracy of recommendation results.
Knowledge Tracing (KT) aims to determine whether students will respond correctly to the next question, which is a crucial task in intelligent tutoring systems (ITS). In educational KT scenarios, transductive ID-based methods often face severe data sparsity and cold start problems, where interactions between individual students and questions are sparse, and new questions and concepts consistently arrive in the database. In addition, existing KT models only implicitly consider the correlation between concepts and questions, lacking direct modeling of the more complex relationships in the heterogeneous graph of concepts and questions. In this paper, we propose a <u>S</u>tructure-aware <u>IN</u>ductive <u>K</u>nowledge <u>T</u>racing model with large language model (dubbed SINKT), which, for the first time, introduces large language models (LLMs) and realizes inductive knowledge tracing. Firstly, SINKT utilizes LLMs to introduce structural relationships between concepts and constructs a hetero- geneous graph for concepts and questions. Secondly, by encoding concepts and questions with LLMs, SINKT incorporates semantic information to aid prediction. Finally, SINKT predicts the student's response to the target question by interacting with the student's knowledge state and the question representation. Experiments on four real-world datasets demonstrate that SINKT achieves state-of-the-art performance among 12 existing transductive KT models. Additionally, we explore the performance of SINKT on the inductive KT task and provide insights into various modules.
Clustering is fundamentally a subjective task: a single dataset can be validly clustered in various ways, and without further information, clustering systems cannot determine the appropriate clustering to perform. This underscores the importance of integrating constraints into clustering, enabling users to convey their preferences to the system. Active constraint-based clustering approaches prioritize the identification of the most valuable constraints to inquire about, striving to achieve effective clustering with the minimal number of constraints needed. We propose an <u>A</u> ctive <u>C</u> lustering with <u>D</u> iffusion <u>M</u> odel (ACDM). ACDM applies the nearest-neighbor technique to construct a diffusion graph, and utilizes an online framework to refine the clustering result iteratively. In each iteration, (a) nodes with high uncertainty and representativeness are selected in batch mode, (b) then a novel neighborhood-set-based query is used for categorizing the selected nodes, using pairwise constraints, and (c) the categorized nodes are used as source nodes in the diffusion model for cluster refinement. We experimentally demonstrate that ACDM outperforms state-of-the-art methods in terms of clustering quality and scalability.
Click-Through Rate (CTR) prediction is a fundamental technique in recommendation and advertising systems. Recent studies have shown that implementing multi-scenario recommendations contributes to strengthening information sharing and improving overall performance. However, existing multi-scenario models only consider coarse-grained explicit scenario modeling that depends on pre-defined scenario identification from manual prior rules, which is biased and sub-optimal. To address these limitations, we propose a Scenario-Aware Hierarchical Dynamic Network for Multi-Scenario Recommendations (HierRec), which perceives implicit patterns adaptively, and conducts explicit and implicit scenario modeling jointly. In particular, HierRec designs a basic scenario-oriented module based on the dynamic weight to capture scenario-specific representations. Then the hierarchical explicit and implicit scenario-aware modules are proposed to model hybrid-grained scenario information, where the multi-head implicit modeling design contributes to perceiving distinctive patterns from different perspectives. Our experiments on two public datasets and real-world industrial applications on a mainstream online advertising platform demonstrate that HierRec outperforms existing models significantly. The implementation code is available for reproducibility.
Clinical notes provide a wealth of patient information that is valuable for predicting clinical outcomes. In particular, predicting hospital 30-day readmission is important to improve healthcare outcomes and reduce cost. Previous works on outcome prediction using clinical notes overlook complex semantic compositions and syntactic structure when learning the note level embedding, which may fail to capture the note semantics and make accurate predictions.
To address these limitations, we propose a Compositional and Hierarchical Semantic Learning Model (CHSLM). It formulates the semantic learning of clinical notes into three hierarchies: word, composition, and note, and aggregates the semantics in a bottom-up manner. To aggregate the semantics from words to compositions, we construct heterogeneous medical-composition graphs to represent word interactions within and between medical compositions and use Graph Neural Networks to learn the composition embedding. To aggregate the semantics from composition- to note-level, we incorporate a mutual BiAffine transformation process. The experimental results on 30-day readmission prediction using two types of clinical notes demonstrate the effectiveness of our method over the state-of-the-art clinical prediction models.
Binary Neural Networks (BNNs) enable efficient deep learning by saving on storage and computational costs. However, as the size of neural networks continues to grow, meeting computational require- ments remains a challenge. In this work, we propose a new form of quantization to tile neural network layers with sequences of bits to achieve sub-bit compression of binary-weighted neural networks. The method learns binary vectors (i.e. tiles) to populate each layer of a model via aggregation and reshaping operations. During in- ference, the method reuses a single tile per layer to represent the full tensor. We employ the approach to both fully-connected and convolutional layers, which make up the breadth of space in most neural architectures. Empirically, the approach achieves near full- precision performance on a diverse range of architectures (CNNs, Transformers, MLPs) and tasks (classification, segmentation, and time series forecasting) with up to an 8x reduction in size compared to binary-weighted models. We provide two implementations for Tiled Bit Networks: 1) we deploy the model to a microcontroller to assess its feasibility in resource-constrained environments, and 2) a GPU-compatible inference kernel to facilitate the reuse of a single tile per layer in memory.
Urban undisciplined events (UUE) are of increasing concern to urban officials because they reduce the quality of life and cause societal disorder. How to accurately predict future occurrences is a key point in preventing these events. However, existing supervised methods struggle to perform well on sparse UUEs while self-supervised MAE-based methods adopt a traditional random masking strategy which leads to limited performance on UUE forecasting. Fortunately, we have designed an innovative spatiotemporal masking strategy and its corresponding pre-training task called <u>M</u>asked <u>S</u>patio-<u>T</u>emporal <u>E</u>vent Series <u>M</u>odeling (MSTEM). Through Cluster-assisted region masking, MSTEM efficiently distributes masked regions evenly among different clusters, enhancing the model's ability to capture spatial correlation and heterogeneity while addressing sparse region distribution of UUEs. Frequency-enhanced patch masking helps the model to sufficiently extract the temporal features of UUEs by reconstructing multiple views. Additionally, we propose future merge and cluster label modeling to enhance the extraction of spatiotemporal dependencies, thereby improving the performance of MSTEM on downstream prediction tasks. Experimental evaluations on four real-world datasets including crimes and disorderly conduct show that our masked autoencoder with MSTEM outperforms most of the state-of-the-art baselines.
Sparse Knowledge Graphs (KGs), frequently encountered in real-world applications, contain fewer facts in the form of (head entity, relation, tail entity) compared to more populated KGs. The sparse KG completion task, which reasons answers for given queries in the form of (head entity, relation, ?) for sparse KGs, is particularly challenging due to the necessity of reasoning missing facts based on limited facts. Path-based models, known for excellent explainability, are often employed for this task. However, existing path-based models typically rely on external models to fill in missing facts and subsequently perform path reasoning. This approach introduces unexplainable factors or necessitates meticulous rule design. In light of this, this paper proposes an alternative approach by looking inward instead of seeking external assistance. We introduce a two-stage path reasoning model called LoGRe (Look Globally and Reason) over sparse KGs. LoGRe constructs a relation-path reasoning schema by globally analyzing the training data to alleviate the sparseness problem. Based on this schema, LoGRe then aggregates paths to reason out answers. Experimental results on five benchmark sparse KG datasets demonstrate the effectiveness of the proposed LoGRe model.
In graph anomaly detection (GAD), the fact that anomalous nodes usually exhibit high heterophily, while most Graph Neural Networks (GNNs) have homophily assumptions, leads to poor performance. Many studies have attempted to solve this problem by employing a set of graph filters covering various frequencies. Their ultimate goal is to design the most appropriate spectral filter to capture the complex signals generated by normals and anomalies. The critical aspect lies in the fusion of information from filters with different frequency response functions. However, existing methods lack a clear indicator to guide the fusion of information at different frequencies. In this paper, we find that local homophily is a valuable metric for assessing the weights of high- and low-frequency information at the node level, and explicitly point out that the accuracy of local homophily is positively correlated with the accuracy of anomaly detection. Moreover, we unveil the phenomenon of camouflage in anomalous, wherein these nodes disguise themselves by making their features resemble those of surrounding normals.
Based on this investigation, we propose the Graph Local Homophily Network for Anomaly Detection (GLHAD). Specifically, we first identify the local homophily of the nodes in the graph under the supervision of the labeled nodes, where two contrasting paradigms are employed to resist the camouflage of anomalies. Then, the local homophily-based combination module combines low- and high-frequency signals based on the predicted local homophily. Eventually, the node representations of different layers are aggregated to make finally predictions.Comprehensive experiments on four anomaly detection datasets show that GLHAD outperforms other state-of-the-art baselines.
Existing non-example class-incremental learning (NECIL) methods usually utilize a combination strategy of replay mechanism and knowledge distillation. However, this combination strategy only focuses on the preservation of old information quantitatively, ignoring the preservation quality. When the old knowledge has wrong redundant information, catastrophic forgetting is more likely to occur. Therefore, obtaining adequate information without impurities as much as possible and removing invalid or even harmful information has become an effective solution to improve the performance of NECIL. This process is consistent with the information bottleneck (IB) theory. Thus, we propose a new NECIL method based on the IB framework. By using the different information obtained from the new and old class samples and the implicit knowledge in the teacher model training process, the error of harmful redundant information learned is eliminated. Specifically, we propose two optimization strategies that align with the two optimization processes of the information bottleneck. Firstly, we employ a pseudo-prototype selection mechanism that selectively incorporates pseudo-samples into the learning process of new and old categories, thus enhancing the distinction between new and old categories and diminishing the mutual information between the input and intermediate features. Secondly, we introduce an attention-based feature distillation method that regulates the distillation strength between feature pairs based on their similarity, thereby augmenting the mutual information between intermediate features and output prediction. Extensive experiments on three benchmarks demonstrate that the proposed method exhibits significant incremental performance improvements over existing methods.
Knowledge Tracing (KT) is a crucial research task for dynamically monitoring students' knowledge states, particularly in online education systems. Recently, knowledge tracing has gained significant attention and in-depth research. Most existing methods rely on students' response data for question understanding and modeling, which helps better updating students' knowledge states. Meanwhile, question ID is utilized to indicate and represent questions. However, this presents a challenge when transitioning to new, cold-start questions that few students has answered before. Also, prior work has overlooked the semantic modeling of questions, which could better assist in modeling the transfer of students' knowledge states. In this paper, we explore leveraging the power of Large Language Models (LLMs) to help understand questions for knowledge tracing, which benefits mitigating cold-start and sparse problems and modeling the transfer of students' knowledge states in a sophisticated manner. Specifically, we first design an attribute estimation module to estimate the attribute of the questions (e.g., difficulty, ability requirements, expected response time) by prompting Large Language Models. Subsequently, we have developed a question embedding module that incorporates graph attention network to effectively utilizing these attributes. Extensive experiments on various datasets demonstrate that our model outperforms existing state-of-the-art models and effectively addresses the problems of cold-start and sparsity. In addition, due to the estimation of multiple attributes of the questions, our model exhibits superior interpretability.
Counterfactual learning to rank (CLTR) can be risky and, in various circumstances, can produce sub-optimal models that hurt performance when deployed. Safe CLTR was introduced to mitigate these risks when using inverse propensity scoring to correct for position bias. However, the existing safety measure for CLTR is not applicable to state-of-the-art CLTR methods, cannot handle trust bias, and relies on specific assumptions about user behavior.
Our contributions are two-fold. First, we generalize the existing safe CLTR approach to make it applicable to state-of-the-art doubly robust CLTR and trust bias. Second, we propose a novel approach, proximal ranking policy optimization (PRPO), that provides safety in deployment without assumptions about user behavior. PRPO removes incentives for learning ranking behavior that is too dissimilar to a safe ranking model. Thereby, PRPO imposes a limit on how much learned models can degrade performance metrics, without relying on any specific user assumptions. Our experiments show that both our novel safe doubly robust method and PRPO provide higher performance than the existing safe inverse propensity scoring approach. However, in unexpected circumstances, the safe doubly robust approach can become unsafe and bring detrimental performance. In contrast, PRPO always maintains safety, even in maximally adversarial situations. By avoiding assumptions, PRPO is the first method with unconditional safety in deployment that translates to robust safety for real-world applications.
Recent advancements in sequential modeling applied to Electronic Health Records (EHR) have greatly influenced prescription recommender systems. While the recent literature on drug recommendation has shown promising performance, the study of discovering a diversity of coexisting temporal relationships at the level of medical codes over consecutive visits remains less explored. The goal of this study can be motivated from two perspectives. First, there is a need to develop a sophisticated sequential model capable of disentangling the complex relationships across sequential visits. Second, it is crucial to establish multiple and diverse health profiles for the same patient to ensure a comprehensive consideration of different medical intents in drug recommendation. To achieve this goal, we introduce Attentive Recommendation with Contrasted Intents (ARCI), a multi-level transformer-based method designed to capture the different but coexisting temporal paths across a shared sequence of visits. Specifically, we propose a novel intent-aware method with contrastive learning, that links specialized medical intents of the patients to the transformer heads for extracting distinct temporal paths associated with different health profiles. We conducted experiments on two real-world datasets for the prescription recommendation task using both ranking and classification metrics. Our results demonstrate that ARCI has outperformed the state-of-the-art prescription recommendation methods and is capable of providing interpretable insights for healthcare practitioners.
Columnar database systems can process complex mixed workloads on a single node. In case of increasing and peak analytical processing demand, we can offload read-only queries to replicas. Partial replication, i.e., duplicating only data subsets to additional nodes, is more cost-efficient than full replication for two primary reasons: (i) Partial replicas require less storage and can be set up faster. (ii) Partial replicas must synchronize only stored data subsets, allowing better scalability. However, determining which queries to offload is challenging for larger workloads because queries access overlapping data subsets and cause synchronization costs.
This paper shows how to calculate optimized replica configurations that consider reallocation and data modification costs using integer linear programming (ILP) techniques. While ILP is effective for solving assignment problems, it does not scale well. For larger problems, users often fall back to simple heuristics, which can lose optimization potential. This paper demonstrates that scalable heuristics can be built on ILP, preserving its strengths. The three proposed approaches for reducing the calculation time allow trading solution quality flexibly. Our evaluations using TPC-H, TPC-DS, and a large real-world accounting workload show that our approach outperforms state-of-the-art solutions, often reducing reallocated data by more than 80% and halving modification costs. At the same time, the new allocations reduce the storage consumption by over 30%, with solutions computed in just a few seconds.
Cognitive diagnosis, a fundamental task in education assessments, aims to quantify the students' proficiency level based on the historical test logs. However, the interactions between students and exercises are incomplete and even sparse, which means that only a few exercise scores of a specific student are observed. A key finding is that the pattern of this missingness is non-random, which could induce bias in the estimated proficiency value. To this end, we formulate cognitive diagnosis with a sample selection problem where observations are sampled through non-random probabilities that correlate with both the student's response correctness and the features of the student and exercise. We proposed a simple but effective method called HeckmanCD, adapting the Heckman two-stage approach to mitigate this endogeneity issue. We first employ an interaction model to predict the occurrence probability of a specific student-exercise pair. After that, a selection variable, derived from this interaction model, is incorporated as a controlled independent variable in the cognitive diagnosis framework. Our analysis reveals that the vanilla estimations of the item response theory model are inherently biased in the existence of confounders, and our method can correct this bias by capturing the covariance. The proposed HeckmanCD can be applied to most existing cognitive diagnosis models, including deep models, and the empirical evaluation demonstrates the effectiveness of our method while no other auxiliary information is required such as textual descriptions of exercises.
Current recommendation systems recommend goods by considering users' historical behaviors, social relations, ratings, and other multi-modals. Although outdated user information presents the trends of a user's interests, no recommendation system can know the users' real-time thoughts indeed. With the development of brain-computer interfaces, it is time to explore next-generation recommenders that show users' real-time thoughts without delay. Electroencephalography (EEG) is a promising method of collecting brain signals because of its convenience and mobility. Currently, there is only few research on EEG-based recommendations due to the complexity of learning human brain activity. To explore the utility of EEG-based recommendation, we propose a novel neural network model, QUARK, combining Quantum Cognition Theory and Graph Convolutional Networks for accurate item recommendations. Compared with the state-of-the-art recommendation models, the superiority of QUARK is confirmed via extensive experiments.
Sarcasm is a form of language used to convey implicit information contradicting the literal meaning of words, often observed on online social media platforms. Accurately detecting satirical or ironic expressions could significantly enhance sentiment analysis and opinion mining. For multi-modal data, capturing both inter- and intra-modal incongruities is crucial for this task. Recently, graph-based approaches to modeling incongruous features bet-ween image and text have made significant progress in this task. However, these methods rely on static networks to capture incongruous features, which makes them inflexible in adapting to diverse groups of text and image, or neglect important information due to inadequate use of text and image. To address these limitations, we propose a multi-modal sarcasm detection model based on the combination of Graph Convolutional Network and Dynamic Network. The graph convolutional network learns the incongruity of the three modal graphs and makes full use of the object-level information. The dynamic network dynamically captures the incongruity between the global-level image and the text and can flexibly adapt to different image and related text. At the same time, we generate augmented text to better utilize the text information. Extensive experiments demonstrate that our proposed method performs favorably against state-of-the-art approaches.
Recommender systems play a crucial role in delivering personalized services to users, but the increasing volume of user data raises significant concerns about privacy, security, and utility. However, existing machine unlearning methods cannot be directly applied to recommendation systems as they overlook the collaborative information shared across users and items. More recently, a method known as RecEraser was introduced, offering partitioning and aggregation-based approaches. Nevertheless, these approaches have limitations due to their inadequate handling of additional overhead costs. In this paper, we propose A General Strategy Graph Collaborative Filtering for Recommendation Unlearning (GSGCF-RU), which is a novel model-agnostic learnable delete operator that optimizes unlearning edge consistency and feature representation consistency. Specifically, the GSGCF-RU model utilizes unlearning edge consistency to eliminate the influence of deleted elements, followed by feature representation consistency to retain knowledge after deletion. Lastly, experimental results on three real-world public benchmarks demonstrate that GSGCF-RU not only achieves efficient recommendation unlearning but also surpasses state-of-the-art methods in terms of model utility. The source code can be found at https://github.com/YongjingHao/GSGCF-RU.
Personalized item ranking has been a crucial component contributing to the performance of recommender systems. As a representative approach, pairwise ranking directly optimizes the ranking with user implicit feedback by constructing ( user, positive item, negative item ) triplets. Several recent works have noticed that treating all triplets equally may hardly achieve the best effects. They assign different importance scores to negative items, user-item pairs, or triplets, respectively. However, almost all the generated importance scores are groundless and hard to interpret, thus far from trustworthy and transparent. To tackle these, we propose the Triplet Shapley ---a Shapely value-based method to measure the triplet importance in an interpretable manner. Due to the huge number of triplets, we transform the original Shapley value calculation to the Monte Carlo (MC) approximation. To stabilize the MC approximation, we adopt a control covariates-based scheme. Finally, we utilize the triplet Shapley values to guide the resampling of important triplets for facilitating the model learning. Extensive experiments are conducted on six public datasets involving classical matrix factorization- and graph neural network-based recommendation models to demonstrate the superiority of our method.
Weather forecasting has become a popular research topic recently, which mainly benefits from the development of spatio-temporal neural networks to effectively extract useful patterns from weather data. Generally, the weather changes in the meteorological system are governed by physical principles. However, it is challenging for spatio-temporal methods to capture the physical knowledge of meteorological dynamics. To address this problem, we propose in this paper a spatio-temporal Transformer network with physical knowledge distillation (PKD-STTN) for weather forecasting. First, the teacher network is implemented by a differential equation network that models weather changes by the potential energy in the atmosphere to reveal the physical mechanism of atmospheric movements. Second, the student network uses a spatio-temporal Transformer that concurrently utilizes three attention modules to comprehensively capture the semantic spatial correlation, geographical spatial correlation, and temporal correlation from weather data. Finally, the physical knowledge of the teacher network is transferred to the student network by inserting a distillation position encoding into the Transformer. Notice that the output of the teacher network is distilled to the position encoding rather than the output of the student network, which can largely utilize physical knowledge without influencing the feature extraction process of Transformers. Experiments on benchmark datasets show that the proposed method can effectively utilize physical principles of weather changes and has obvious performance advantages compared with several strong baselines.
Algorithmic fairness has been receiving increasing attention in recent years. Among others, individual fairness, with its root in the dictionary definition of fairness, offers a fine-grained fairness notion. At the algorithmic level, individual fairness can often be operationalized as a convex regularization term with respect to a similarity matrix. Appealing as it might be, a notorious challenge of individual fairness lies in how to find appropriate distance or similarity measure, which largely remains open to date. Consequently, the similarity or distance measure used in almost any individually fair algorithm is likely to be imperfect due to various reasons such as imprecise prior/domain knowledge, noise, or even adversaries. In this paper, we take an important step towards resolving this fundamental challenge and ask: how sensitive is the individually fair learning algorithm with respect to the given similarities? How can we make the learning results robust with respect to the imperfection of the given similarity measure? First (Soul-M), we develop a sensitivity measure to characterize how the learning outcomes of an individually fair learning algorithm change in response to the change of the given similarity measure. Second (Soul-A ), based on the proposed sensitive measure, we further develop a robust individually fair algorithm by adversarial learning that optimizes the similarity matrix to defend against L_∞ attack. A unique advantage of our sensitivity measure and robust algorithm lies in that they are applicable to a broad range of learning models as long as the objective function is twice differentiable. We conduct extensive experiments to demonstrate the efficacy of our methods.
Source localization in social platforms is critical for managing and controlling the misinformation spreading. Despite all the recent advancements, existing methods do not consider the dynamic and heterogeneous propagation behaviors of users and are developed based on simulated data with strong model assumptions, limiting the application in real-world scenarios. This research addresses this limitation by presenting a novel framework for source localization, grounded in real-world propagation cascades from platforms like Weibo and Twitter. What's more, recognizing the user-driven nature of users in information spread, we systematically crawl and integrate user-specific profiles, offering a realistic understanding of user-driven propagation dynamics. In summary, by developing datasets derived from real-world propagation cascades, we set a precedent in enhancing the authenticity and practice of source identification for social media. Our comprehensive experiments not only validate the feasibility and rationale of our novel user-centric localization approaches but also emphasize the significance of considering user profiles in real-world propagation scenarios. The code is available at https://github.com/cgao-comp/NFSL.
Novel Class Discovery (NCD) involves identifying new categories within unlabeled data by utilizing knowledge acquired from previously established categories. However, existing NCD methods often struggle to maintain a balance between the performance of old and new categories. Discovering unlabeled new categories in a class-incremental way is more practical but also more challenging, as it is frequently hindered by either catastrophic forgetting of old categories or an inability to learn new ones. Furthermore, the implementation of NCD on continuously scalable graph-structured data remains an under-explored area. In response to these challenges, we introduce for the first time a more practical NCD scenario for node classification (i.e., NC-NCD), and propose a novel self-training framework with prototype replay and distillation called SWORD, adopted to our NC-NCD setting. Our approach enables the model to cluster unlabeled new category nodes after learning labeled nodes while preserving performance on old categories without reliance on old category nodes. SWORD achieves this by employing a self-training strategy to learn new categories and preventing the forgetting of old categories through the joint use of feature prototypes and knowledge distillation. Extensive experiments on four common benchmarks demonstrate the superiority of SWORD over other state-of-the-art methods.
This paper enhances option pricing accuracy by incorporating financial expertise into a neural network (NN) design and optimizing data sample quality through cleaning and synthesis. Instead of directly estimating option values (OVs) with NNs, we leverage the concept of control variate by decomposing OVs as time values (TVs) estimated by NNs, plus the analytically solvable intrinsic values (IVs). TV surface can be decomposed into two scenarios with very different properties, and we design two NNs according to our derived no-arbitrage constraints for these two scenarios. To alleviate learning inaccuracy due to the kink of the TV surface along the scenario boundary, we synthesize training samples based on our derived constraints to smoothly extend the surface for each scenario. On the other hand, irrational option quotes commonly found in illiquid markets incur uneven surfaces, significantly deteriorating NN predictability. We develop a learnable data-cleaning method to remove potentially irrational quotes spotted by no-arbitrage constraints properly. Besides, unnecessary data syntheses proposed in previous literature can also be removed by incorporating corresponding constraints into our NN to enhance training efficiency. Comprehensive experiments on liquid S&P 500 and illiquid TAIEX option markets examine the superiority of our approach.
Timeline summarization involves condensing events from news articles to illustrate the temporal development of a specific topic. Traditional methods often extract events based on the number of related reports but tend to overlook the movement of protagonists, the leading actors participating in events that shape the progression of the topic. This oversight can result in the extraction of sensationalized events unrelated to the topic's progression, distracting readers from tracking the topic's development. To address this limitation, we propose a novel strategy that identifies protagonists through dependency relations and tracks changes in the context surrounding them over time using a multi-faceted temporal graph. This temporal graph is a sequence of graphs that effectively captures information progression and shifts over time. Our approach aims to build a biographical timeline with accurate chronology by identifying and following the movement of protagonists. Our experiments demonstrate that our method, PIECE, outperforms previous approaches in date assignment for timeline summarization across different language datasets.
Cross-domain Recommendation (CDR) aims to alleviate the data sparsity and the cold-start problems in traditional recommender systems by leveraging knowledge from an informative source domain. However, previously proposed CDR models pursue an imprudent assumption that the entire information from the source domain is equally contributed to the target domain, neglecting the evil part that is completely irrelevant to users' intrinsic interest. To address this concern, in this paper, we propose a novel knowledge enhanced cross-domain recommendation framework named CoTrans, which remolds the core procedures of CDR models with: Compression on the knowledge from the source domain and Transfer of the purity to the target domain. Specifically, following the theory of Graph Information Bottleneck, CoTrans first compresses the source behaviors with the perception of information from the target domain. Then to preserve all the important information for the CDR task, the feedback signals from both domains are utilized to promote the effectiveness of the transfer procedure. Additionally, a knowledge-enhanced encoder is employed to narrow gaps caused by the non-overlapped items across separate domains. Comprehensive experiments on three widely used cross-domain datasets demonstrate that CoTrans significantly outperforms both single-domain and state-of-the-art cross-domain recommendation approaches.
Spatio-temporal graph neural networks have proven efficacy in capturing complex dependencies for urban computing tasks such as forecasting and kriging. Yet, their performance is constrained by the reliance on extensive data for training on a specific task, thereby limiting their adaptability to new urban domains with varied task demands. Although transfer learning has been proposed to remedy this problem by leveraging knowledge across domains, the cross-task generalization still remains under-explored in spatio-temporal graph transfer learning due to the lack of a unified framework. To bridge the gap, we propose Spatio-Temporal Graph Prompting (STGP), a prompt-based framework capable of adapting to multi-diverse tasks in a data-scarce domain. Specifically, we first unify different tasks into a single template and introduce a task-agnostic network architecture that aligns with this template. This approach enables capturing dependencies shared across tasks. Furthermore, we employ learnable prompts to achieve domain and task transfer in a two-stage prompting pipeline, facilitating the prompts to effectively capture domain knowledge and task-specific properties. Our extensive experiments demonstrate that STGP outperforms state-of-the-art baselines in three tasks-forecasting, kriging, and extrapolation-achieving an improvement of up to 10.7%.
Empathetic response generation is designed to comprehend the emotions of others and select the most appropriate strategies to assist them in resolving emotional challenges. Empathy can be categorized into cognitive empathy and affective empathy. The former pertains to the ability to understand and discern the emotional issues and situations of others, while the latter involves the capacity to provide comfort. To enhance one's empathetic abilities, it is essential to develop both these aspects. Therefore, we develop an innovative framework that combines retrieval augmentation and emotional support strategy integration. Our framework starts with the introduction of a comprehensive emotional palette for empathy. We then apply appraisal theory to decompose this palette and create a database of empathetic responses. This database serves as an external resource and enhances the LLM's empathy by integrating semantic retrieval mechanisms. Moreover, our framework places a strong emphasis on the proper articulation of response strategies. By incorporating emotional support strategies, we aim to enrich the model's capabilities in both cognitive and affective empathy, leading to a more nuanced and comprehensive empathetic response. Finally, we extract datasets ED and ET from the empathetic dialogue dataset EmpatheticDialogues and ExTES based on dialogue length. Experiments demonstrate that our framework can enhance the empathy ability of LLMs from both cognitive and affective empathy perspectives. Our code is released at https://github.com/CAS-SIAT-XinHai/APTNESS.
Cross-Domain Recommendation (CDR) have received widespread attention due to their ability to utilize rich information across domains. However, most existing CDR methods assume an ideal static condition that is not practical in industrial recommendation systems (RS). Therefore, simply applying existing CDR methods in the industrial RS environment may lead to low effectiveness and efficiency. To fill this gap, we propose DIIT, an end-to-end Domain-Invariant Information Transfer method for industrial cross-domain recommendation. Specifically, We first simulate the industrial RS environment that maintains respective models in multiple domains, each of them is trained in the incremental mode. Then, for improving the effectiveness, we design two extractors to fully extract domain-invariant information from the latest source domain models at the domain level and the representation level respectively. Finally, for improving the efficiency, we design a migrator to transfer the extracted information to the latest target domain model, which only need the target domain model for inference. Experiments conducted on one production dataset and two public datasets verify the effectiveness and efficiency of DIIT.
Entity Set Expansion (ESE) is a critical task aiming at expanding entities of the target semantic class described by seed entities. Most existing ESE methods are retrieval-based frameworks that need to extract contextual features of entities and calculate the similarity between seed entities and candidate entities. To achieve the two purposes, they iteratively traverse the corpus and the entity vocabulary, resulting in poor efficiency and scalability. Experimental results indicate that the time consumed by the retrieval-based ESE methods increases linearly with entity vocabulary and corpus size. In this paper, we firstly propose Generative Entity Set Expansion (GenExpan) framework, which utilizes a generative pre-trained auto-regressive language model to accomplish ESE task. Specifically, a prefix tree is employed to guarantee the validity of entity generation, and automatically generated class names are adopted to guide the model to generate target entities. Moreover, we propose Knowledge Calibration and Generative Ranking to further bridge the gap between generic knowledge of the language model and the goal of ESE task. For efficiency, expansion time consumed by GenExpan is independent of entity vocabulary and corpus size, and GenExpan achieves an average 600% speedup compared to strong baselines. For expansion effectiveness, our framework outperforms previous state-of-the-art ESE methods.
The surge in merchant fraud poses a significant threat to market order and consumer security. Effective security monitoring for merchants is crucial in safeguarding the digital life ecosystem and users' financial well-being. Detecting daily fraudulent payment transactions, a challenging task for current methods, requires efficient transformation of transactions into embeddings, especially in representing merchants based on their behavioral transactions. To address this, we propose the Grouping Sampling-based Sequence Generation (GSSG) method to generate meaningful sequences, enabling interactions among correlated transactions. We introduce Hierarchical Embedding Learning (HEL) and Hierarchical Masking pre-training (HMP) for the effective representation of hierarchical structures within flat transaction sequences. Pretrained on WeChat Pay data, our model, PTP, demonstrates superior performance in downstream fraud transaction detection, especially in few-shot learning scenarios, showcasing great potential in payment transaction scenarios.
Large Language Models (LLMs) face challenges due to hallucination issues. Current solutions use retrieval-augmented generation (RAG), integrating LLMs with external knowledge to enhance answer accuracy. However, the misuse of irrelevant external knowledge can be misleading. In this paper, we propose a novel method called Retrieve-and-Discriminate Prompter (RD-P), which leverages knowledge graphs (KGs) for trustworthy RAG by synchronizing knowledge retrieval and discrimination in a unified model. Specifically, we train a prompter based on a pre-trained language model with shared parameters. It has two key modules: the retriever and the discriminator. The retriever identifies relevant reasoning paths in the KG, while the discriminator evaluates their credibility through "logical coverage calculation" and in turn instructs the retrieval process. Prompts are then constructed to guide LLMs in reasoning and answering questions using both retrieved and implicit knowledge. Experiments on knowledge-intensive question answering (QA) tasks demonstrate that our method significantly improves answer coverage rate while reducing the retrieval scale, achieving superior performance in complex KGQA tasks compared with state-of-the-art RAG methods at a low cost.
This paper delves into the interpretability of Graph Neural Networks in the context of Boolean Satisfiability. The goal is to demystify the internal workings of these models and provide insightful perspectives into their decision-making processes. This is done by uncovering connections to two approximation algorithms studied in the domain of Boolean Satisfiability: Belief Propagation and Semidefinite Programming Relaxations. Revealing these connections has empowered us to introduce a suite of impactful enhancements. The first significant enhancement is a curriculum training procedure, which incrementally increases the problem complexity in the training set, together with increasing the number of message passing iterations of the Graph Neural Network. We show that the curriculum, together with several other optimizations, reduces the training time by more than an order of magnitude compared to the baseline without the curriculum. Furthermore, we apply decimation and sampling of initial embeddings, which significantly increase the percentage of solved problems.
How can we efficiently analyze a specific time range on an irregular tensor? PARAFAC2 decomposition is widely used when analyzing an irregular tensor which consists of several matrices with different row sizes. A crucial task related to PARAFAC2 decomposition is to analyze sub-tensors corresponding to various time ranges of a given tensor, instead of analyzing the entire tensor. Although many recent works have developed efficient PARAFAC2 decomposition methods, existing PARAFAC2 decomposition methods are inappropriate for addressing various time range queries, as they need to decompose sub-tensors from scratch.
In this paper, we propose Repeat, a fast and accurate PARAFAC2 decomposition method for handling arbitrary time range queries on irregular tensors. To avoid decomposing sub-tensors of queries from scratch, Repeat obtains preprocessed results that support efficient query answering before time ranges are given. For time range queries, Repeat efficiently computes the PARAFAC2 decomposition for the sub-tensors corresponding to the queries by using preprocessed results rather than the original irregular tensor. We experimentally demonstrate that Repeat outperforms existing PARAFAC2 methods, providing up to 12x faster speed while having comparable errors. We also present a case study for the use of Repeat in detecting locally appearing patterns through a variety of time range queries.
Calibrated recommendation, which aims to maintain personalized proportions of categories within recommendations, is crucial in practical scenarios since it enhances user satisfaction by reflecting diverse interests. However, achieving calibration in a sequential setting (i.e., calibrated sequential recommendation) is challenging due to the need to adapt to users' evolving preferences. Previous methods typically leverage reranking algorithms to calibrate recommendations after training a model without considering the effect of calibration and do not effectively tackle the conflict between relevance and calibration during the reranking process. In this work, we propose LeapRec (Calibration-Disentangled <u>Lea</u>rning and Relevance-<u>P</u>rioritized Reranking), a novel approach for the calibrated sequential recommendation that addresses these challenges. LeapRec consists of two phases, model training phase and reranking phase. In the training phase, a backbone model is trained using our proposed calibration-disentangled learning-to-rank loss, which optimizes personalized rankings while integrating calibration considerations. In the reranking phase, relevant items are prioritized at the top of the list, with items needed for calibration following later to address potential conflicts between relevance and calibration. Through extensive experiments on four real-world datasets, we show that LeapRec consistently outperforms previous methods in the calibrated sequential recommendation. Our code is available at https://github.com/jeon185/LeapRec.
Beyond the traditional CNN structure, we have recently witnessed lots of breakthroughs in computer vision architectures such as Vision Transformer, MLP-Mixer, SNN-MLP, and so on. However, many efforts in developing novel architectures for vision tasks are heavily focused on achieving powerful performances, and how to attain interpretability in a trained neural network remains an open question. Inspired by the imaginary system GLOM, we present HiLite : <u>Hi</u>erarchical <u>L</u>evel-<u>i</u>mplemented Archi<u>te</u>cture attaining Part-Whole Interpretability, where islands of identical vectors can provide unprecedented interpretability. In our column-like structure, each level is a layer of a part-whole hierarchy composed of multiple neurons, and the function to define the neural field along an image input patch is initialized as the level vector inside the model. We propose two-column networks (Top-Down (TD) and Bottom-Up (BU)) that allow inter-level communication between adjacent levels on a specific patch and propose Gated Consensus Attention to perform intra-level communication on different patches within the level. At each time step, the level vector and outputs from different networks are combined into a weighted sum and passed to the next step, and outputs from the final time step are utilized as representation vectors. Here, supervised contrastive learning is used to find the relationship of meaningful patches in each class, where negative examples contribute to preventing representation collapse between neighboring patches. HiLite shows a possibility of performance through a quantitative evaluation on four image classification datasets as well as two metrics for assessing representation quality and showcases the intrinsic interpretability by simply generating a visual cue. We believe that our work is a solid step towards novel research on neural architectures attaining interpretability.
As the mobile gaming market experiences significant growth, there is a continuous emergence of new gaming products such as premium games and video mini-games. Meeting their marketing needs and supporting their business growth is essential for long-term prosperity. However, compared with extensive studies on user modeling such as LTV prediction, much less attention has been drawn to special gaming products, especially in terms of understanding their lifecycle stages and corresponding demands. Unlike modeling individual users, understanding games is closely tied to user behavior: the lifecycle of a game encompasses the entire process from initial user interaction to churn, and by accurately identifying and tracking the evolution of the game lifecycle can lead to better personal service. This raises the necessity of comprehensively understanding the lifecycle process model of the game. In this paper, we introduce the GameTrail - Probabilistic Lifecycle Process Model, designed to construct the complete lifecycle and stage representation for games and users through long-term repeated interactions. Specifically, we first initiate the complete game lifecycle using a joint probabilistic stochastic process model by defining the lifecycle stages of both games and users as latent variables and learning it via Bayesian Variation Inference. Furthermore, we employ cross attention and online embedding learning to capture the more recent advertising context changes in the in-game stage transitions. Finally, we collect various games' data from public resources to construct an experimental dataset for our experiments. Meanwhile, more comprehensive experiments conducted on real-world gaming industry datasets have showcased the effectiveness of our approach, showing a relative improvement of 48% and 17% on NMSE and NMAE than the live baseline.
Urban flow prediction is a spatio-temporal modelling task that estimates the throughput of transportation services like buses, taxis, and ride-sharing, where data-driven models have become the most popular solution in the past decade. Meanwhile, the implicitly learned mapping between historical observations to the prediction targets tend to over-simplify the dynamics of real-world urban flows, leading to suboptimal predictions. Some recent spatio-temporal prediction solutions bring remedies with the notion of physics-guided machine learning (PGML), which describes spatio-temporal data with nuanced and principled physics laws, thus enhancing both the prediction accuracy and interpretability. However, these spatio-temporal PGML methods are built upon a strong assumption that the observed data fully conforms to the differential equations that define the physical system, which can quickly become ill-posed in urban flow prediction tasks. The observed urban flow data, especially when sliced into time-dependent snapshots to facilitate predictions, is typically incomplete and sparse, and prone to inherent noise incurred in the collection process (e.g., uncalibrated traffic sensors). As a result, such physical inconsistency between the data and PGML model significantly limits the predictive power and robustness of the solution. Moreover, due to the interval-based predictions and intermittent nature of data filing (e.g., one record per 30 minutes) in many transportation services, the instantaneous dynamics of urban flows can hardly be captured, rendering differential equation-based continuous modelling a loose fit for this setting. To overcome the challenges, we develop a discretized physics-guided network (PN), and propose a data-aware framework <u>P</u>hysics-<u>g</u>uided <u>A</u>ctive <u>S</u>ample <u>R</u>eweighting (P-GASR) to enhance PN. Technically, P-GASR incorporates an active sample reweighting pipeline, which not only minimizes the model uncertainty of PN to enhance robustness, but also prioritizes data samples that exhibit higher physical compliance to reinforce their contribution to PN training. Experimental results in four real-world datasets demonstrate that our method achieves state-of-the-art performance with a demonstrable improvement in robustness. The code is released at https://github.com/WeiJiang01/P-GASR.
Recently, federated learning (FL) has achieved wide successes for diverse privacy-sensitive applications without sacrificing the sensitive private information of clients. However, the data quality of client datasets can not be guaranteed since corresponding annotations of different clients often contain complex label noise of varying degrees, which inevitably causes the performance degradation. Intuitively, the performance degradation is dominated by clients with higher noise rates since their trained models contain more misinformation from data, thus it is necessary to devise an effective optimization scheme to mitigate the negative impacts of these noisy clients. In this work, we propose a two-stage framework FedELC to tackle this complicated label noise issue. The first stage aims to guide the detection of noisy clients with higher label noise, while the second stage aims to correct the labels of noisy clients' data via an end-to-end label correction framework which is achieved by learning possible ground-truth labels of noisy clients' datasets via back propagation. We implement sixteen related methods and evaluate five datasets with three types of complicated label noise scenarios for a comprehensive comparison. Extensive experimental results demonstrate our proposed framework achieves superior performance than its counterparts for different scenarios. Additionally, we effectively improve the data quality of detected noisy clients' local datasets with our label correction framework. The code is available at https://github.com/Sprinter1999/FedELC.
Spatiotemporal time series are usually collected via monitoring sensors placed at different locations, which usually contain missing values due to various failures, such as mechanical damages and Internet outages. Imputing the missing values is crucial for analyzing time series. When recovering a specific data point, most existing methods consider all the information relevant to that point regardless of the cause-and-effect relationship. During data collection, it is inevitable that some unknown confounders are included, e.g., background noise in time series and non-causal shortcut edges in the constructed sensor network. These confounders could open backdoor paths and establish non-causal correlations between the input and output. Over-exploiting these non-causal correlations could cause overfitting. In this paper, we first revisit spatiotemporal time series imputation from a causal perspective and show how to block the confounders via the frontdoor adjustment. Based on the results of frontdoor adjustment, we introduce a novel <u>C</u>ausality-<u>A</u>ware <u>Sp</u>atiot<u>e</u>mpo<u>r</u>al Graph Neural Network (Casper), which contains a novel Prompt Based Decoder (PBD) and a Spatiotemporal Causal Attention (SCA). PBD could reduce the impact of confounders and SCA could discover the sparse causal relationships among embeddings. Theoretical analysis reveals that SCA discovers causal relationships based on the values of gradients. We evaluate Casper on three real-world datasets, and the experimental results show that Casper could outperform the baselines and could effectively discover the causal relationships.
With the increasing application of deep learning to solve scientific problems in biochemistry, molecular federated learning has become popular due to its ability to offer distributed privacy-preserving solutions. However, most existing molecular federated learning methods rely on joint training with public datasets, which are difficult to obtain in practice. These methods also fail to leverage multi-modal molecular representations effectively. To address the above issues, we propose a novel framework, Federated Heterogeneous Contrastive Distillation (FedHCD), which enables to jointly train global models from clients with heterogeneous data modalities, learning tasks, and molecular models. To aggregate data representations of different modalities in a data-free manner, we design a global multi-modal contrastive strategy to align the representation of clients without public dataset. Utilizing intrinsic characteristics of molecular data in different modalities, we tackle the exacerbation of local model drift and data Non-IIDness caused by multi-modal clients. We introduce a multi-view contrastive knowledge transfer to extract features from atoms, substructures, and molecules, solving the issue of information distillation failure due to dimensional biases in different data modalities. Our evaluations on eight real-world molecular datasets and ablation experiments show that FedHCD outperforms other state-of-the-art FL methods, irrespective of whether or not they use public datasets.
The utilization of automated depression detection significantly enhances early intervention for individuals experiencing depression. Despite numerous proposals on automated depression detection using recorded clinical interview videos, limited attention has been paid to considering the hierarchical structure of the interview questions. In clinical interviews for diagnosing depression, clinicians use a structured questionnaire that includes routine baseline questions and follow-up questions to assess the interviewee's condition. This paper introduces HiQuE (Hierarchical Question Embedding network), a novel depression detection framework that leverages the hierarchical relationship between primary and follow-up questions in clinical interviews. HiQuE can effectively capture the importance of each question in diagnosing depression by learning mutual information across multiple modalities. We conduct extensive experiments on the widely-used clinical interview data, DAIC-WOZ, where our model outperforms other state-of-the-art multimodal depression detection models and emotion recognition models, showcasing its clinical utility in depression detection.
Multi-label data is prevalent across various applications, where instances can be annotated with a set of classes. Although multi-label data can take various forms, such as images and text, tabular multi-label data stands out as the predominant data type in many real-world scenarios. Over the past decades, numerous methods have been proposed for tabular multi-label classification. Effectively addressing challenges like class imbalance, correlation among labels and features, and scalability is crucial for a high-performance multi-label classifier. However, many existing methods fall short of fully considering the correlation between labels and features. In cases where attempts are made, they often encounter high computational costs, rendering them impractical for large datasets. This paper in- troduces an innovative classification method for tabular multi-label data, utilizing a fusion of transformers and graph convolutional networks (GCN). The central concept of the proposed approach involves transforming tabular data into images, leveraging state-of-the-art methods in image processing, including image-based transformers and pre-trained models to capture correlation among labels effectively. Our approach jointly learns the representation of feature space and the correlation among labels within a unified network. To substantiate the performance of our proposed method, we conducted a rigorous series of experiments across diverse multi-label datasets1. The results underscore the superior performance and scalability of our approach compared to other existing state-of-the-art methods. This work not only contributes a novel perspective to the field of tabular multi-label classification but also showcases advancements in both accuracy and scalability.
We introduce a novel embedding method diverging from conventional approaches by operating within function spaces of finite dimension rather than finite vector space, thus departing significantly from standard knowledge graph embedding techniques. Initially employing polynomial functions to compute embeddings, we progress to more intricate representations using neural networks with varying layer complexities. We argue that employing functions for embedding computation enhances expressiveness and allows for more degrees of freedom, enabling operations such as composition, derivatives and primitive of entities representation. Additionally, we meticulously outline the step-by-step construction of our approach and provide code for reproducibility, thereby facilitating further exploration and application in the field.
In data analysis, unsupervised anomaly detection holds an important position for identifying statistical outliers that signify atypical behavior, erroneous readings, or interesting patterns within data. The Transformer model, known for its ability to capture dependencies within sequences, has revolutionized areas such as text and image data analysis. However, its potential for tabular data, where sequence dependencies are not inherently present, remains underexplored. This paper introduces Transformer for Point Anomaly Detection (TransPAD), a novel Transformer-based AutoEncoder framework specifically designed for point anomaly detection. Our method captures interdependencies across entire datasets, addressing the challenges posed with non-sequential, tabular data. It incorporates unique random and criteria sampling strategies for effective training and anomaly identification, and avoids the common pitfall of trivial generalization that affects many conventional methods. By leveraging an attention weight-based anomaly scoring system, TransPAD offers a more precise approach to detect anomalies. Extensive testing on a range of benchmark tabular datasets shows that TransPAD consistently outperforms existing methods. Our source code is available at https://github.com/nth221/TransPAD.
Unsupervised anomaly detection is a daunting task, as it relies solely on normality patterns from the training data to identify unseen anomalies during testing. Recent approaches have focused on leveraging domain-specific transformations or perturbations to generate synthetic anomalies from normal samples. The objective here is to acquire insights into normality patterns by learning to differentiate between normal samples and these crafted anomalies. However, these approaches often encounter limitations when domain-specific transformations are not well-specified such as in tabular data, or when it becomes trivial to distinguish between them. To address these issues, we introduce a novel domain-agnostic method that employs a set of conditional perturbators and a discriminator. The perturbators are trained to generate input-dependent perturbations, which are subsequently utilized to construct synthetic anomalies, and the discriminator is trained to distinguish normal samples from them. We ensure that the generated anomalies are both diverse and hard to distinguish through two key strategies: i) directing perturbations to be orthogonal to each other and ii) constraining perturbations to remain in proximity to normal samples. Throughout experiments on real-world datasets, we demonstrate the superiority of our method over state-of-the-art benchmarks, which is evident not only in image data but also in tabular data, where domain-specific transformation is not readily accessible. Additionally, we empirically confirm the adaptability of our method to semi-supervised settings, demonstrating its capacity to incorporate supervised signals to enhance anomaly detection performance even further.
The goal of dynamic signed network embedding (DSNE) is to represent the nodes in a dynamic signed network (DSN) as embeddings that preserve the evolving nature of conflicting relationships between nodes. While existing DSNE methods are useful for understanding polarization between users in diverse domains, they fail to consider the concept of a community boundary that contributes to network-wide polarization and lack inductive ability due to their reliance on homophily bias. To address these limitations, we propose a novel DSNE method, named PolarDSN, which learns the evolution of network <u>POLAR</u>ization and enhances inductive ability for <u>D</u>ynamic <u>S</u>igned <u>N</u>etworks. It leverages node-level community boundaries as well as structural characteristics of nodes such as structural isomorphism and temporal transitivity. Experiments on four real-world DSN datasets demonstrate that PolarDSN consistently and significantly outperforms 12 state-of-the-art methods, achieving up to 31.6% and 21.1% improvement in macro-F1 for transductive and inductive settings, respectively. The code is available at https://github.com/kmj0792/PolarDSN.
Deep Neural Networks (DNNs) tend to perform poorly on unseen domains due to domain shifts. Domain Generalization (DG) aims to improve the performance on such scenarios by minimizing the distribution discrepancy between source domains. Among many studies, dropout-based DG approaches which remove domain-specific features have gained attention. However, they are limited in minimizing the upper bound of generalization risk because they do not explicitly consider the distribution discrepancy when discarding features. In this paper, we propose a novel Discrepancy-guided Channel Dropout (DgCD) for DG that explicitly derives the discrepancy between domains and drops the channels with significant distribution discrepancy. Given a training batch, we perform two ways of standardization: (1) based on the variance/mean of the batch (i.e., sampled from all source domains) and (2) based on the variance/mean of domain-wise samples in the batch. With the two normal distributions, we explicitly derive the discrepancy using KL-divergence and backpropagate it towards each channel. A channel with a higher contribution to the discrepancy is more likely to be dropped. Experimental results show the superiority of DgCD over the state-of-the-art DG baselines, demonstrating the effectiveness of our dropout strategy which is directly coupled to reducing the domain discrepancy. Our code is available at: https://github.com/gyeomo/DgCD
Current face recognition (FR) algorithms frequently encounter discrimination issues in terms of various attributes (e.g., gender, age) due to the biased demographic distribution of the training datasets towards specific groups. In this paper, we study an identity protected fair FR problem where the goal is to augment the datasets with external face images while ensuring the anonymity of the corresponding face identities. Our problem is motivated by the limitation of current fairness driven data augmentation approaches that directly utilize the external face images accessed by FR algorithm developers while ignoring the protection on the face identities of the image owners. To address the problem, we develop FaDE, a face segment driven identity anonymization framework that augments biased face image datasets by identifying specific face segments with diversified demographic characteristics from external face images but with least identity disclosure, and then reconstructing the segments to full face images with new identities. As a result, the augmented dataset is under a more balanced demographic distribution and improves the fairness performance of the optimized FR algorithms. We evaluate FaDE on two public face datasets, CelebA and LFW that suffer from various demographic imbalance. The results show that FaDE significantly enhances both fairness and accuracy performance of the optimized FR algorithms, while keeping effective anonymity for the identities of external face images.
Federated Learning (FL) has emerged as a groundbreaking distributed learning paradigm enabling clients to train a global model collaboratively without exchanging data. Despite enhancing privacy and efficiency in information retrieval and knowledge management contexts, training and deploying FL models confront significant challenges such as communication bottlenecks, data heterogeneity, and memory limitations. To comprehensively address these challenges, we introduce FeDEQ, a novel FL framework that incorporates deep equilibrium learning and consensus optimization to harness compact global data representations for efficient personalization. Specifically, we design a unique model structure featuring an equilibrium layer for global representation extraction, followed by explicit layers tailored for local personalization. We then propose a new FL algorithm rooted in the alternating directions method of multipliers (ADMM), which enables the joint optimization of a shared equilibrium layer and individual personalized layers across distributed datasets. Our theoretical analysis confirms that FeDEQ converges to a stationary point, achieving both compact global representations and optimal personalized parameters for each client. Extensive experiments on various benchmarks demonstrate that FeDEQ matches the performance of state-of-the-art personalized FL methods, while significantly reducing communication size by up to 4 times and memory footprint by 1.5 times during training.
As deep learning technologies continue to evolve, the challenge of training neural networks with noisy data becomes increasingly critical. Incorrect labels, which often result in the model's tendency to memorize incorrect data--a phenomenon known as the memorization effect--significantly undermine both performance and the ability to generalize. Traditional methods to address noisy labels typically involve extensive modifications during training, leading to prolonged refinement processes. Although some recent approaches eliminate the need for retraining by using pre-trained models, they still face challenges with lengthy refinement times and inaccurate noise ratio estimations. In response, we introduce FastSimiFeat, a novel algorithm that utilizes the k-nearest neighbors (k-NN) technique on feature vectors derived from pre-trained models efficiently. This training-free method incorporates a new confusion matrix-based noise ratio estimator that significantly reduces the need for iterative refinement by adapting the number of k-NN cycles based on the detected noise level. Additionally, we propose an innovative label correction method that leverages potentially noisy data to enhance model robustness and generality. Our extensive evaluations on both synthetic and real-world datasets demonstrate that FastSimiFeat not only minimizes refinement time but also consistently outperforms existing methods in terms of accuracy. These results confirm the suitability of FastSimiFeat for industrial applications where reliable data processing is paramount. By leveraging inherent features of neural networks pre-trained on large datasets, FastSimiFeat sets a new standard for minimal-dependency approaches in noisy data environments, facilitating the deployment of more reliable and efficient deep learning models across various sectors.
Recently, there has been a growing interest among researchers in understanding molecules and their textual descriptions through molecule language models (MoLM). However, despite some early promising developments, the advancement of MoLM still trails significantly behind that of vision language models (VLM). This is because unique challenges exist apart from VLM in the field of MoLM due to 1) a limited amount of molecule-text paired data and 2) missing expertise that occurred due to the specialized areas of focus among the experts. To this end, we propose AMOLE, which 1) augments molecule-text pairs with structural similarity preserving loss, and 2) transfers the expertise between the molecules. Specifically, AMOLE enriches molecule-text pairs by sharing descriptions among structurally similar molecules with a novel structural similarity preserving loss. Moreover, we propose an expertise reconstruction loss to transfer knowledge from molecules that have extensive expertise to those with less expertise. Extensive experiments on various downstream tasks demonstrate the superiority of AMOLE in comprehending molecules and their descriptions, highlighting its potential for application in real-world drug discovery. The source code for AMOLE is available at https://github.com/Namkyeong/AMOLE.
Multi-behavior recommender systems, rapidly advancing across various domains, utilize plentiful auxiliary interactions on a variety of user behaviors to enhance recommendations for the target behavior, such as purchases. While previous methods have made strides in leveraging such interactions with advanced machine learning methods, they still face challenges in adequately using multi-faceted relationships among behaviors and handling uncertain auxiliary interactions that could potentially lead to purchases or not. In this paper, we propose MuLe (Multi-Grained Graph Learning), a novel graph-based model designed to address these limitations. We design a multi-grained graph learning strategy to capture diverse aspects of behaviors, ranging from unified to specific, and then to target-related behavior interactions. To handle uncertain interactions, we use graph attention, weighting the importance of those interactions related to the target behavior. Afterward, we use an attention mechanism to effectively aggregate diverse behavior embeddings obtained from the multi-grained graph encoders. Extensive experiments show that MuLe significantly outperforms the state-of-the-art methods, achieving improvements of up to 44.6% in HR@10 and 52.9% in NDCG@10, respectively. Our code and datasets are available at https://github.com/geonwooko/MULE.
Achieving the generalization of an invariant classifier from training domains to shifted test domains while simultaneously considering model fairness is a substantial and complex challenge in machine learning. Existing methods address the problem of fairness-aware domain generalization, focusing on either covariate shift or correlation shift, but rarely consider both at the same time. In this paper, we introduce a novel approach that focuses on learning a fairness-aware domain-invariant predictor within a framework addressing both covariate and correlation shifts simultaneously, ensuring its generalization to unknown test domains inaccessible during training. In our approach, data are first disentangled into content and style factors in latent spaces. Furthermore, fairness-aware domain-invariant content representations can be learned by mitigating sensitive information and retaining as much other information as possible. Extensive empirical studies on benchmark datasets demonstrate that our approach surpasses state-of-the-art methods with respect to model accuracy as well as both group and individual fairness.
During visual data analysis, users often explore visualizations one at a time, with each visualization leading to new directions of exploration. We consider a conversational approach to visualization, where users specify their needs at each step in natural language, with a visualization being returned in turn. Prior work has shown that visualization generation can be boiled down to the identification of visualization intent and visual encodings. Recognizing that the latter is a well-studied problem with standard solutions, we focus on the former, i.e., identifying visualization intent during conversation. We develop Luna, a framework that comprises a novel combination of language models adapted from BERT and rule-based inference, that together predict various aspects of visualization intent. We compare Luna with other conversational NL-to-visualization and NL-to-SQL approaches (adapted to visualization intent), including GPT-3.5 and GPT-4, and demonstrate that Luna has 14.3% higher accuracy than the state-of-the-art. We also apply Luna to a usage scenario on a dataset of police misconduct, showcasing its benefits relative to other approaches.
Temporal motifs are recurring subgraph patterns in temporal graphs, and are present in various domains such as social networks, fraud detection, and biological networks. Despite their significance, counting temporal motifs efficiently remains a challenge, particularly on moderately sized datasets with millions of motif instances. To address this challenge, we propose a novel algorithm called Scalable <u>Mo</u>tif Counting with <u>T</u>ime-aware <u>T</u>opology C<u>o</u>nstraint (MoTTo). MoTTo focuses on accurately counting temporal motifs with up to three nodes and three edges. It first utilizes a topology constraint-based pruning strategy to eliminate nodes that cannot participate in forming temporal motifs before the counting process. Then, it adopts a time-aware topology constraint-based pruning strategy to split large-scale datasets into independent partitions and filter out the unrelated ones, ensuring that the counting results remain unaffected. By investigating the second pruning strategy, we also find that MoTTo can be implemented in a multi-thread manner, further accelerating the counting process significantly. Experimental results on several real-world datasets of varying sizes demonstrate that MoTTo outperforms state-of-the-art methods in terms of efficiency, achieving up to a nine-fold improvement in total temporal motif counting. Specifically, the efficiency of counting triangular temporal motifs is enhanced by up to 31 times compared to state-of-the-art baselines.
As necessary information about whether cooperation can be reached, rewards should be determined in advance in Vertical Federated Learning (VFL). To determine reasonable rewards, participant contributions should be estimated precisely. We propose a Vertically Federated Contribution Estimation (VF-CE) method. VF-CE calculates Mutual Information (MI) between distributed features and the label using a neural network trained via VFL itself. Note that compensation for CE is low as it only covers computation costs, and reward for real VFL training is high as it needs to cover training costs as well as participants' contributions to model performance and the resulting business benefits. Because MI presents a strong positive correlation with the final model performance, contributions to model performance can be estimated based on contributions to MI. We integrate a scalar-level attention mechanism in MI neural network. The attention weights of participants are treated as their contributions. We find that attention weights can effectively measure contribution redundancy, as its Spearman correlation coefficient with Shapley value is as high as 0.963. We demonstrate that VF-CE also satisfies properties of balance, zero element, and symmetry concerning fairness, which are hallmark properties of Shapley value. Compared with existing work, we consider contribution redundancy precisely, efficiently output approximated Shapley values through one MI calculation instead of 2 n where n is the number of participants, and introduce no extra privacy risk except the inherent risk in VFL, i.e., gradient transmission.
Software vulnerabilities are a challenge in cybersecurity. Manual security patches are often difficult and slow to be deployed, while new vulnerabilities are created. Binary code vulnerability detection is less studied and more complex compared to source code, and this has important practical implications. Deep learning has become an efficient and powerful tool in the security domain, where it provides end-to-end and accurate prediction. Modern deep learning approaches learn the program semantics through sequence and graph neural networks, using various intermediate representation of programs, such as abstract syntax trees (AST) or control flow graphs (CFG). Due to the complex nature of program execution, the output of an execution depends on the many program states and inputs. Also, a CFG generated from static analysis can be an overestimation of the true program flow. Moreover, the size of programs often does not allow a graph neural network with fixed layers to aggregate global information. To address these issues, we propose DeepEXE, an agent-based implicit neural network that mimics the execution path of a program. We use reinforcement learning to enhance the branching decision at every program state transition and create a dynamic environment to learn the dependency between a vulnerability and certain program states. An implicitly defined neural network enables nearly infinite state transitions until convergence, which captures the structural information at a higher level. The experiments are conducted on two semi-synthetic and two real-world datasets. We show that DeepEXE is an accurate and efficient method and outperforms the state-of-the-art vulnerability detection methods.
Different from traditional knowledge graphs, where facts are usually represented as (subject, relation, object), hyper-relational knowledge graphs (HKGs) allow facts to be associated with additional relation-entity pairs to constrain the validity of facts. HKGs contain a substantial amount of textual information, which plays a crucial role in enriching representations. However, existing HKG embedding methods mainly rely on structural information but overlook textual information in HKGs, which are less effective in representing entities with limited structural information. To address this issue, the paper proposes HIST (Hyper-relational Knowledge Graph Encoder Integrating Structure and Text), which incorporates textual information and structural information in HKGs to enhance representations of entities and relations. HIST adopts the graph convolutional network to extract structural information and utilizes it to generate the Structure Soft Prompt. During the Structure Soft Prompt Tuning process, the textual information and structural information are fully integrated to generate more comprehensive representations. Additionally, an effective contrastive learning method for HKG embedding is formulated to improve the efficiency of negative sampling. Experimental results show that HIST achieves state-of-the-art performance on several public datasets. Our code is available at https://github.com/QieFangBaiLuQingYaJian/HIST.
Long-term time series forecasting has gained significant attention in recent years due to its widely-application in various fields. Transformer-based models have gained popularity for the ability to capture long-sequence interactions. However, these models are limited in real-world use because of the memory consumption and computation explosion. The CNN-based models are also one of the main models used for time series prediction, but their performance has always been inferior to the transformer-based models in previous works. We have reconsidered the role of CNN components and redefined the way CNN basic components are used for time series prediction. In addition, the time lags information between periods in the time series is important. Unfortunately, existing works lack consideration of this classic but important information. Motivated by these factors, we propose a fast yet effective CNN model with time lags for multivariate long-term time series forecasting, named LagCNN. Specifically, the time series is transformed into lag-patches to capture the correlation between periods. Then, a fast CNN model is performed in the feature dimension rather than the time dimension like most previous works do. Meanwhile, information aggregation is performed in the time dimension to extract complex temporal patterns. LagCNN significantly outperforms state-of-the-art on multiple publicly available datasets. One step further, LagCNN exhibits significant efficiency advantages over the most efficient Transformer model (PatchTST), resulting in a significant reduction in memory usage (4.4×) and runtime (10.7×).
The development of cloud computing has met the growing demand for dataset search in the era of massive data. In the field of spatial dataset search, the high prevalence of sensitive information in spatial datasets underscores the necessity of privacy-preserving search processing in the cloud. However, existing spatial dataset search schemes are designed on plaintext datasets and do not consider privacy protection in search processing. In this paper, we first propose a privacy-preserving spatial dataset search scheme. The density distribution-based similarity model is proposed to measure the similarity between spatial datasets, and then the order-preserving encrypted similarity is designed to achieve secure similarity calculation. With the above idea, the baseline search scheme (PriDAS) is proposed. To improve the search efficiency, a two-layer index is designed to filter candidate datasets and accelerate the similarity calculation between datasets. By using the index, the optimized search scheme (PriDAS+) is proposed. To analyze the security of the proposed schemes, the game simulation-based proof is presented. Experimental results on three real-world spatial data repositories with 100,000 spatial datasets show that PriDAS+ only needs less than 0.4 seconds to accomplish the search processing.
Unsupervised graph representation learning (UGRL) based on graph neural networks (GNNs), has received increasing attention owing to its efficacy in handling graph-structured data. However, existing UGRL methods ideally assume that the node features are noise-free, which makes them fail to distinguish between useful information and noise when applied to real data with noisy features, thus affecting the quality of learned representations. This urges us to take node noisy features into account in real-world UGRL. With empirical analysis, we reveal that feature propagation, the essential operation in GNNs, acts as a "double-edged sword" in handling noisy features - it can both denoise and diffuse noise, leading to varying feature quality across nodes, even within the same node at different hops. Building on this insight, we propose a novel UGRL method based on <u>M</u>ulti-hop feature <u>Q</u>uality <u>E</u>stimation (MQE for short). Unlike most UGRL models that directly utilize propagation-based GNNs to generate representations, our approach aims to learn representations through estimating the quality of propagated features at different hops. Specifically, we introduce a Gaussian model that utilizes a learnable "meta-representation" as a condition to estimate the expectation and variance of multi-hop propagated features via neural networks. In this way, the ''meta representation'' captures the semantic and structural information underlying multiple propagated features but is naturally less susceptible to interference by noise, thereby serving as high-quality node representations beneficial for downstream tasks. Extensive experiments on multiple real-world datasets demonstrate that MQE in learning reliable node representations in scenarios with diverse types of feature noise.
Lane-level traffic prediction is crucial for refined smart city applications, yet the scarcity and quality issues of datasets hinder its development. To overcome these challenges, this study introduces a novel <u> M </u>ulti-<u> c </u>hannel <u> g </u>raph-structured <u> V </u>ariational <u> A </u>uto<u> E </u>ncoder model, McgVAE. This model integrates road-level information to provide a global perspective for lane prediction and performs integrated tasks through three interconnected channels: the road-level channel ensures accurate prediction of road traffic states and communicates closely with the data quality channel to share historical and predicted road information; the data quality channel leverages road-level information to identify and correct missing and noisy data; and finally, the lane channel uses the aforementioned information for lane-level traffic prediction. After extensive experimental comparisons with multiple baseline models across three datasets, the McgVAE model demonstrated outstanding predictive performance and the ability to handle data missingness and noise
Medication recommendation systems are developed to recommend suitable medications tailored to specific patient. Previous researches primarily focus on learning medication representations, which have yielded notable advances. However, these methods are limited to capturing personalized patient representations due to the following primary limitations: (i) unable to capture the differences in the impact of diseases/procedures on patients across various patient health states; (ii) fail to model the direct causal relationships between medications and specific health state of patients, resulting in an inability to determine which specific disease each medication is treating. To address these limitations, we propose CausalMed, a patient health state-centric model capable of enhancing the personalization of patient representations. Specifically, CausalMed first captures the causal relationship between diseases/procedures and medications through causal discovery and evaluates their causal effects. Building upon this, CausalMed focuses on analyzing the health state of patients, capturing the dynamic differences of diseases/procedures in different health states of patients, and transforming diseases/procedures into medications on direct causal relationships. Ultimately, CausalMed integrates information from longitudinal visits to recommend medication combinations. Extensive experiments on real-world datasets show that our method learns more personalized patient representation and outperforms state-of-the-art models in accuracy and safety.
Diversity is increasingly recognized as a crucial factor in recommendation systems for enhancing user satisfaction. However, existing studies on diversity-enhanced recommendation systems primarily focus on designing recommendation strategies, often overlooking the development of evaluation metrics. Widely used diversity metrics such as CC, ILAD, and ILMD are typically assessed independently of accuracy. This separation leads to a critical limitation: existing diversity measures are unable to distinguish between diversity improvements from effective recommendations and those from in effective recommendations. Our evaluations reveal that the diversity improvements are primarily contributed by ineffective recommendations, which often do not positively contribute to user satisfaction. Furthermore, existing diversity metrics disregard the feature distribution of ground-truth items, potentially skewing the assessment of diversity performance. To address these limitations, we design three new accuracy-aware metrics: DCC, FDCC, and DILAD, and conduct a re-evaluation using these metrics. Surprisingly, our results illustrate that the diversity improvements of existing diversity-enhanced approaches are limited and even negative compared to those of accurate recommendations. This finding underscores the need to explore more sophisticated diversity-enhanced techniques for improving the diversity within effective recommendations.
Despite the recent significant advancements in poster layout generation, existing works are mainly unaware of the given design elements (i.e., text, logo, and underlay), which leads to undesirable layouts or visual artifacts. The visual artifacts we refer to include (1) improper sizes, e.g., placing a short piece of text into a large textbox or long texts into small text boxes, and (2) image distortion, e.g., the stretched logo in Fig. 1. To advance research in this field, we propose a new design-element aware poster layout generation task, which require the generated layouts to not only have harmonic relationships but also fit well with the design elements. To address this task, we propose Design Element aware Transformer (DET), an encoder-decoder based transformer network, to generate reasonable layouts that fit not only the background images but also the design elements. The encoder extracts a fine-grained multi-scale representation from the background image and its saliency map. The decoder receives the background features and produces layouts conditioned on the content and desired sizes of the design elements. Adopting the multi-scale representation and the deformable attention in both the encoder and decoder enables our method to accurately understand/generate the spatial relationships between the background objects and design elements. We adapted three public poster layout generation datasets to fit our task and conducted experiments on them. In the meantime, we propose a new evaluation metric called AspDiff to measure whether the generated layout matches the given design elements. Quantitative and qualitative evaluation on three datasets demonstrates that DET yields superior results compared to other layout generation methods. Our code and datasets will be released.
Graphs constructed from real-world scenarios are often incomplete due to privacy restrictions or resource limitations, posing significant challenges for node classification, especially when labeled data are scarce. In many scenarios of incomplete graphs, the real node degrees, such as the number of followers in social networks or publications' references in citation networks, are easily accessible and informative, which could indicate the degree of incompleteness. However, most of existing researches of incomplete graphs focus on edge completion, but ignore the node completion with known node degrees. In this paper, we propose a new few-shot node classification problem on incomplete graphs with real node degrees. To deal with node completion, edge completion and label completion of this problem, we develop an effective Large Language Models (LLMs) empowered Graph Convolutional Network (GCN) model utilizing the real node Degrees, namely LLMDGCN. First, we leverage LLMs to initially fill in the missing nodes and labels. Next, we design an edge prediction module that employs the real node degrees and inter-category probability matrix to recover the missing edges for each node. We then iteratively train the GCN and the edge prediction module. The GCN generates pseudo labels, which the edge prediction module uses to restore edges, and these edges are fed back into the GCN to improve accuracy. Extensive experiments on four benchmark datasets demonstrate the effectiveness and robustness of our proposed method for the few-shot node classification on incomplete graphs with real node degrees.
Graph embedding has become a powerful tool for learning latent representations of nodes in a graph. Despite its superior performance in various graph-based machine learning tasks, serious privacy concerns arise when the graph data contains personal or sensitive information. To address this issue, we investigate and develop graph embedding algorithms that satisfy local differential privacy (LDP). We introduce a novel privacy-preserving graph embedding framework, named PrivGE, to protect node data privacy. Specifically, we propose an LDP mechanism to obfuscate node data and utilize personalized PageRank as the proximity measure to learn node representations. Furthermore, we provide a theoretical analysis of the privacy guarantees and utility offered by the PrivGE framework. Extensive experiments on several real-world graph datasets demonstrate that PrivGE achieves an optimal balance between privacy and utility, and significantly outperforms existing methods in node classification and link prediction tasks.
Knowledge graph (KG) completion has been increasingly recognized as a vital approach for uncovering missing knowledge and addressing the incompleteness issue in KGs. To enhance inference on rare relations and mitigate the impact of the long-tail distribution, the dominant strategy designs few-shot models following the meta-learning paradigm. However, these approaches typically operate under the assumption that KGs are available instantly, disregarding the newly emerging relations during KG enrichment. Thus, the emergence of these novel relations presents a need for few-shot models to continually learn from emerging knowledge. Although promising, two significant obstacles, i.e., catastrophic forgetting and the scarcity of novel relations, prevent effective learning from newly emerging relations. In this paper, we propose a novel framework designed to equip the few-shot model with the ability to learn sequentially from novel relations. Specifically, we introduce innovative strategies at both data and model levels: data-level rehearsal and model-level modulation to address catastrophic forgetting, alongside multi-view relation augmentation aimed at resolving the issue of insufficient novel relations. Extensive experiments conducted on real-world KGs validate the effectiveness of our proposed method.
Recent works demonstrate the effectiveness of multi-modal information for sequential recommendation. However, the computational cost and representation degeneration fail to be focused specifically and addressed adequately in multi-modality recommendation. To this end, we first identify and formalize three properties i.e., diversity, compactness, and consistency from the geometric space and spectrum perspective. Building upon this foundation, we devise tailored loss functions to regularize the above three properties for representation optimization. Theoretical underpinnings and experimental results demonstrate the efficacy of an enhanced item representation in ameliorating degeneration. Furthermore, we propose an efficient and expandable image-centered method, named E2 ImgRec, to mitigate the immense cost of computation. Concretely, we substitute the linear projection operations in the self-attention module and feed-forward network layer with two learnable rescaling vectors or efficient recommendation, then leverage cross-attention for multi-modality information fusion. Extensive experiments on three public datasets illustrate our method outperforms representative ID-based solutions and multi-modal based state-of-the-arts with only up to 39.9% in memory usage and 4.3× acceleration in training time. The code for replication is available at https://github.com/WHUIR/E2ImgRec.
Social recommendation has emerged as a powerful approach to enhance personalized recommendations by leveraging the social connections among users, such as following and friend relations observed in online social platforms. The fundamental assumption of social recommendation is that socially-connected users exhibit homophily in their preference patterns. This means that users connected by social ties tend to have similar tastes in user-item activities, such as rating and purchasing. However, this assumption is not always valid due to the presence of irrelevant and false social ties, which can contaminate user embeddings and adversely affect recommendation accuracy. To address this challenge, we propose a novel diffusion-based social denoising framework for recommendation (RecDiff). Our approach utilizes a simple yet effective hidden-space diffusion paradigm to alleivate the noisy effect in the compressed and dense representation space. By performing multi-step noise diffusion and removal, RecDiff possesses a robust ability to identify and eliminate noise from the encoded user representations, even when the noise levels vary. The diffusion module is optimized in a downstream task-aware manner, thereby maximizing its ability to enhance the recommendation process. We conducted extensive experiments to evaluate the efficacy of our framework, and the results demonstrate its superiority in terms of recommendation accuracy, training efficiency, and denoising effectiveness. The source code for the model implementation is publicly available at: https://github.com/HKUDS/RecDiff.
Exploring the missing values is an essential but challenging issue due to the complex latent spatio-temporal correlation and dynamic nature of time series. Owing to the outstanding performance in dealing with structure learning potentials, Graph Neural Networks (GNNs) and Recurrent Neural Networks (RNNs) are often used to capture such complex spatio-temporal features in multivariate time series. However, these data-driven models often fail to capture the essential spatio-temporal relationships when significant signal corruption occurs. Additionally, calculating the high-order neighbor nodes in these models is of high computational complexity. To address these problems, we propose a novel higher-order spatio-temporal physics-incorporated GNN (HSPGNN). Firstly, the dynamic Laplacian matrix can be obtained by the spatial attention mechanism. Then, the generic inhomogeneous partial differential equation (PDE) of physical dynamic systems is used to construct the dynamic higher-order spatio-temporal GNN adaptively to obtain the missing time series values. Moreover, we estimate the missing impact by Normalizing Flows (NF) to evaluate the importance of each node in the graph for better explainability. Experimental results on four benchmark datasets demonstrate the effectiveness of HSPGNN and the superior performance when combining various order neighbor nodes. Also, graph-like optical flow, dynamic graphs, and missing impact can be obtained naturally by HSPGNN, which provides better dynamic analysis and explanation than traditional data-driven models.
Graph Databases (Graph DB) find extensive application across diverse domains such as finance, social networks, and medicine. Yet, the translation of Natural Language (NL) into the Graph Query Language (GQL), referred to as NL2GQL, poses significant challenges owing to its intricate and specialized nature. Some approaches have sought to utilize Large Language Models (LLMs) to address analogous tasks like text2SQL. Nonetheless, in the realm of NL2GQL tasks tailored to a particular domain, the absence of domain-specific NL-GQL data pairs adds complexity to aligning LLMs with the graph DB. To tackle this challenge, we present a well-defined pipeline. Initially, we use ChatGPT to generate NL-GQL data pairs, leveraging the provided graph DB and two mutual verification self-instruct methods which ensure consistency between NL and GQL. Subsequently, we employ the generated data to fine-tune LLMs, ensuring alignment between LLMs and the graph DB. Moreover, we find the importance of relevant schema in efficiently generating accurate GQLs. Thus, we introduce a method to extract relevant schema as the input context. We evaluate our method using two carefully constructed datasets derived from graph DBs in the finance and medicine domains, named FinGQL and MediGQL. Experimental results reveal that our approach significantly outperforms a set of baseline methods, with improvements of 5.90 and 6.36 absolute points on EM, and 6.00 and 7.09 absolute points on EX for FinGQL and MediGQL, respectively
Large language models (LLMs) have shown impressive success in various applications. However, they encounter issues in accurately understanding user intentions, thereby impeding the successful accomplishment of tasks. The pioneering study tackles intention understanding through iteratively interacting with users to enhance response quality; however, it fails to identify the notorious challenges associated with the task, where efficiency and accuracy are paramount for ensuring optimal user experience. To address these challenges, we introduce a new interactive table based intention understanding (ITIU) framework, which refers to and implements non-linear thinking in psychology such that details of intention are parallelly generated. Specifically, in the table interacting design phase, ITIU first brainstorms a more concrete intention table relevant to user instructions and subsequently incorporates a rule-based supervision mechanism to enhance the accuracy of its content. In the specialized model training phase, we obtain the procedural records generated by ITIU to develop a specialized upstream interactive intention understanding model. The specialized model replaces internal steps within the original interaction design for further efficiency improvement. Comprehensive experimental results demonstrate that ITIU significantly outperforms existing intention understanding methods, particularly in terms of interaction efficiency and intention understanding accuracy. Furthermore, whether integrated into the open-source LLaMA or powerful LLMs like GPT-4 and Claude-3, ITIU shows significant performance improvements. All the data and codes are released.
Adversarial training (AT) can help improve the robustness of Vision Transformers (ViT) against adversarial attacks by intentionally injecting adversarial examples into the training data. However, this way of adversarial injection inevitably incurs standard accuracy degradation to some extent, thereby calling for a trade-off between standard accuracy and adversarial robustness. Besides, the prominent AT solutions are still vulnerable to adaptive attacks. To tackle such shortcomings, this paper proposes a novel ViT architecture, including a detector and a classifier bridged by our newly developed adaptive ensemble. Specifically, we empirically discover that detecting adversarial examples can benefit from the Guided Backpropagation technique. Driven by this discovery, a novel Multi-head Self-Attention (MSA) mechanism is introduced for enhancing our detector to sniff adversarial examples. Then, a classifier with two encoders is employed for extracting visual representations respectively from clean images and adversarial examples, with our adaptive ensemble to adaptively adjust the proportion of visual representations from the two encoders for accurate classification. This design enables our ViT architecture to achieve a better trade-off between standard accuracy and adversarial robustness. Besides, the adaptive ensemble technique allows us to mask off a random subset of image patches within input data, boosting our ViT's robustness against adaptive attacks, while maintaining high standard accuracy. Experimental results exhibit that our ViT architecture, on CIFAR-10, achieves the best standard accuracy and adversarial robustness of 90.3 % and 49.8 %, respectively.
Multimodal recommendation systems (MMRS) have received considerable attention from the research community due to their ability to jointly utilize information from user behavior and product images and text. Previous research has two main issues. First, many long-tail items in recommendation systems have limited interaction data, making it difficult to learn comprehensive and informative representations. However, past MMRS studies have overlooked this issue. Secondly, users' modality preferences are crucial to their behavior. However, previous research has primarily focused on learning item modality representations, while user modality representations have remained relatively simplistic. To address these challenges, we propose a novel <u>G</u>raphs and <u>U</u>ser <u>M</u>odalities <u>E</u>nhancement (GUME) for long-tail multimodal recommendation. Specifically, we first enhance the user-item graph using multimodal similarity between items. This improves the connectivity of long-tail items and helps them learn high-quality representations through graph propagation. Then, we construct two types of user modalities: explicit interaction features and extended interest features. By using the user modality enhancement strategy to maximize mutual information between these two features, we improve the generalization ability of user modality representations. Additionally, we design an alignment strategy for modality data to remove noise from both internal and external perspectives. Extensive experiments on four publicly available datasets demonstrate the effectiveness of our approach. The code and data are publicly accessible via GitHub.
Integrated Warehousing and Distribution Supply Networks (IWDSN) have shown their high efficiency in E-commerce. Efficient supply capacity prediction is crucial for logistics systems to maintain the delivery capacity to meet users' requirements. However, unforeseen events such as extreme weather and public health emergencies pose challenges in supply forecasting. Previous work mainly infers supply optimization based on the invariant topology of logistic networks, neglecting dynamic routing and distinct node effects reacting to emergencies. To address these challenges, the hierarchical relations among warehouses, sorting centers, and delivery stations in logistic networks are necessary to learn the diverse reactions. In this paper, we propose a hierarchical spatio-temporal graph learning model to predict the emergency supply capacity of IWDSN based on micro and macro graphs. The micro graph shows transportation connectivity while the macro graph shows the geographical correlation. Specifically, it consists of three components. (1) For micro graphs, a metapath aggregation strategy is designed to capture dynamic routing information on both route-view and event-view graphs. (2) For macro graphs, a bipartite graph learning approach to extract spatial representations. (3) For spatio-temporal feature fusion, the spatio-temporal joint forecasting module combines the temporal feature from the time-series encoder with hierarchical spatial features to predict the future supply capacity. The extensive experiments on two real-world datasets demonstrate the effectiveness of our proposed model, which achieves state-of-the-art performance compared with advanced baselines.
Network embedding has numerous practical applications and has received extensive attention in graph learning, which aims at mapping vertices into a low-dimensional and continuous dense vector space by preserving the underlying structural properties of the graph. Many network embedding methods have been proposed, among which factorization of the Personalized PageRank (PPR for short) matrix has been empirically and theoretically well supported recently. However, several fundamental issues cannot be addressed. (1) Existing methods invoke a seminal Local Push subroutine to approximate a single row or column of the PPR matrix. Thus, they have to execute n (n is the number of nodes) Local Push subroutines to obtain a provable PPR matrix, resulting in prohibitively high computational costs for large n. (2) The PPR matrix has limited power in capturing the structural similarity between vertices, leading to performance degradation. To overcome these dilemmas, we propose PSNE, an efficient spectral sParsification method for Scaling Network Embedding, which can fast obtain the embedding vectors that retain strong structural similarities. Specifically, PSNE first designs a matrix polynomial sparser to accelerate the calculation of the PPR matrix, which has a theoretical guarantee in terms of the Frobenius norm. Subsequently, PSNE proposes a simple but effective multiple-perspective strategy to enhance further the representation power of the obtained approximate PPR matrix. Finally, PSNE applies a randomized singular value decomposition algorithm on the sparse and multiple-perspective PPR matrix to get the target embedding vectors. Experimental evaluation of real-world and synthetic datasets shows that our solutions are indeed more efficient, effective, and scalable compared with ten competitors.
Sequential recommender systems aims to predict the users' next interaction through user behavior modeling with various operators like RNNs and attentions. However, existing models generally fail to achieve the three golden principles for sequential recommendation simultaneously, i.e., training efficiency, low-cost inference, and strong performance. To this end, we propose RecBLR, an Efficient Sequential Recommendation Model based on Behavior-Dependent Linear Recurrent Units to accomplish the impossible triangle of the three principles. By incorporating gating mechanisms and behavior-dependent designs into linear recurrent units, our model significantly enhances user behavior modeling and recommendation performance. Furthermore, we unlock the parallelizable training as well as inference efficiency for our model by designing a hardware-aware scanning acceleration algorithm with a customized CUDA kernel. Extensive experiments on real-world datasets with varying lengths of user behavior sequences demonstrate RecBLR's remarkable effectiveness in simultaneously achieving all three golden principles - strong recommendation performance, training efficiency, and low-cost inference, while exhibiting excellent scalability to datasets with long user interaction histories.
Click-through rate (CTR) prediction is crucial for personalized online services. Sample-level retrieval-based models, such as RIM, have demonstrated remarkable performance. However, they face challenges including inference inefficiency and high resource consumption due to the retrieval process, which hinder their practical application in industrial settings. To address this, we propose a universal plug-and-play <u>r</u>etrieval-<u>o</u>riented <u>k</u>nowledge (ROK) framework that bypasses the real retrieval process. The framework features a knowledge base that preserves and imitates the retrieved & aggregated representations using a decomposition-reconstruction paradigm. Knowledge distillation and contrastive learning optimize the knowledge base, enabling the integration of retrieval-enhanced representations with various CTR models. Experiments on three large-scale datasets demonstrate ROK's exceptional compatibility and performance, with the neural knowledge base serving as an effective surrogate for the retrieval pool. ROK surpasses the teacher model while maintaining superior inference efficiency and demonstrates the feasibility of distilling knowledge from non-parametric methods using a parametric approach. These results highlight ROK's strong potential for real-world applications and its ability to transform retrieval-based methods into practical solutions. Our implementation code is available to support reproducibility1.
Recommender systems play a pivotal role across practical scenarios, showcasing remarkable capabilities in user preference modeling. However, the centralized learning paradigm predominantly used raises serious privacy concerns. The federated recommender system (FedRS) addresses this by updating models on clients, while a central server orchestrates training without accessing private data. Existing FedRS approaches, however, face unresolved challenges, including non-convex optimization, vulnerability, potential privacy leakage risk, and communication inefficiency. This paper addresses these challenges by reformulating the federated recommendation problem as a convex optimization issue, ensuring convergence to the global optimum. Based on this, we devise a novel method, RFRec, to tackle this optimization problem efficiently. In addition, we propose RFRecF, a highly efficient version that incorporates non-uniform stochastic gradient descent to improve communication efficiency. In user preference modeling, both methods learn local and global models, collaboratively learning users' common and personalized interests under the federated learning setting. Moreover, both methods significantly enhance communication efficiency, robustness, and privacy protection, with theoretical support. Comprehensive evaluations on four benchmark datasets demonstrate RFRec and RFRecF's superior performance compared to diverse baselines. The code is available to ease reproducibility1.
Cognitive reasoning holds a significant place within Natural Language Processing (NLP). Yet, the exploration of zero-shot scenarios, which align more closely with real-life situations than supervised scenarios, has been relatively limited. While a few studies have employed Large Language Models (LLMs) to tackle zero-shot cognitive reasoning tasks, they still grapple with two key challenges: 1) Traditional approaches rely on the chain-of-thought (CoT) mechanism, wherein LLMs are provided with a "Let's think step by step'' prompt. However, this schema may not accurately understand the meaning of a given question and ignores the possible learned knowledge (e.g., background or commonsense) of the LLMs about the questions, leading to incorrect answers. 2) Previous CoT methods normally exploit a single Large Language Model (LLM) and design many strategies to augment this LLM. We argue that the power of a single LLM is typically finite since it may not have learned some relevant knowledge about the question. To address these issues, we propose a Multi-LLM Knowledge Fusion (MLKF) approach, which resorts to heterogeneous knowledge emerging from multiple LLMs, for zero-shot cognitive reasoning tasks. Through extensive experiments and detailed analysis, we demonstrate that our MLKF can outperform the existing zero-shot or unsupervised state-of-the-art methods on four kinds of zero-shot tasks: aspect sentiment analysis, named entity recognition, question answering, and mathematical reasoning. Our code is available at https://github.com/trueBatty/MLKF
Enabling various parties to share data enhances online fraud detection capabilities considering fraudsters tend to reuse resources attacking multiple platforms. Multi-party computation (MPC) techniques, such as secret sharing, offer potential privacy-preserving solutions but face efficiency challenges when handling large-scale data. This paper presents a novel approach, SecureFD (Secure Fraud Detector), aimed at detecting fraud in multi-party graph data, ensuring privacy, accuracy, and scalability. We propose a graph neural network EPR-GNN, which is MPC-friendly, as the base detector. Then we design a framework that allows multiple parties to train EPR-GNN collaboratively on secure sparse graphs in a privacy- preserving manner. The oblivious node embedding sharing protocol in the collaborative training procedure achieves up to a 45× speed-up, supporting over four million users compared to the naive solution. Additionally, we further reduce secure computation by locally pruning a significant number of non-suspicious users and selecting only the most valuable resources for sharing. Experiments on real datasets demonstrate that by securely integrating data from different parties, SecureFD achieves superior detection performance compared to state-of-the-art local detectors. And the local pruning greatly improves the scalability without compromising detection accuracies.
Representation learning in sequential recommendation is critical for accurately modeling user interaction patterns and improving recommendation precision. However, existing approaches predominantly emphasize item-to-item transitions, often neglecting the time intervals between interactions, which are closely related to behavior pattern changes. Additionally, broader interaction attributes, such as item frequency, are frequently overlooked. We found that both sequences with more uniform time intervals and items with higher frequency yield better prediction performance. Conversely, non-uniform sequences exacerbate user interest drift and less-frequent items are difficult to model due to sparse sampling, presenting unique challenges inadequately addressed by current methods. In this paper, we propose UniRec, a novel bidirectional enhancement sequential recommendation method. UniRec leverages sequence uniformity and item frequency to enhance performance, particularly improving the representation of non-uniform sequences and less-frequent items. These two branches mutually reinforce each other, driving comprehensive performance optimization in complex sequential recommendation scenarios. Additionally, we present a multidimensional time module to further enhance adaptability. To the best of our knowledge, UniRec is the first method to utilize the characteristics of uniformity and frequency for feature augmentation. Comparing with eleven advanced models across four datasets, we demonstrate that UniRec outperforms SOTA models significantly. The code is available at https://github.com/Linxi000/UniRec.
Knowledge Graph Embedding (KGE), which projects entities and relations into continuous vector spaces, has garnered significant attention. Although high-dimensional KGE methods offer better performance, they come at the expense of significant computation and memory overheads. Decreasing embedding dimensions significantly deteriorates model performance. While several recent efforts utilize knowledge distillation or non-Euclidean representation learning to augment the effectiveness of low-dimensional KGE, they either necessitate a pre-trained high-dimensional teacher model or involve complex non-Euclidean operations, thereby incurring considerable additional computational costs. To address this, this work proposes Confidence-aware Self-Knowledge Distillation (CSD) that learns from the model itself to enhance KGE in a low-dimensional space. Specifically, CSD extracts knowledge from embeddings in previous iterations, which would be utilized to supervise the learning of the model in the next iterations. Moreover, a specific semantic module is developed to filter reliable knowledge by estimating the confidence of previously learned embeddings. This straightforward strategy bypasses the need for time-consuming pre-training of teacher models and can be integrated into various KGE methods to improve their performance. Our comprehensive experiments on six KGE backbones and four datasets underscore the effectiveness of the proposed CSD.
With the development of multimedia systems, multimodal recommendations are playing an essential role, as they can leverage rich contexts beyond interactions. Existing methods mainly regard multimodal information as an auxiliary, using them to help learn ID features; However, there exist semantic gaps among multimodal content features and ID-based features, for which directly using multimodal information as an auxiliary would lead to misalignment in representations of users and items. In this paper, we first systematically investigate the misalignment issue in multimodal recommendations, and propose a solution named AlignRec. In AlignRec, the recommendation objective is decomposed into three alignments, namely alignment within contents, alignment between content and categorical ID, and alignment between users and items. Each alignment is characterized by a specific objective function and is integrated into our multimodal recommendation framework. To effectively train AlignRec, we propose starting from pre-training the first alignment to obtain unified multimodal features and subsequently training the following two alignments together with these features as input. As it is essential to analyze whether each multimodal feature helps in training and accelerate the iteration cycle of recommendation models, we design three new classes of metrics to evaluate intermediate performance. Our extensive experiments on three real-world datasets consistently verify the superiority of AlignRec compared to nine baselines. We also find that the multimodal features generated by AlignRec are better than currently used ones, which are to be open-sourced in our repository https://github.com/sjtulyf123/AlignRec_CIKM24.
The ubiquity of missing data has sparked considerable attention and focus on tabular data imputation methods. Diffusion models, recognized as the cutting-edge technique for data generation, demonstrate significant potential in tabular data imputation tasks. However, in pursuit of diversity, vanilla diffusion models often exhibit sensitivity to initialized noises, which hinders the models from generating stable and accurate imputation results. Additionally, the sparsity inherent in tabular data poses challenges for diffusion models in accurately modeling the data manifold, impacting the robustness of these models for data imputation. To tackle these challenges, this paper introduces an advanced diffusion model named <u>S</u> elf-supervised <u>imp</u> utation <u>D</u> iffusion <u>M</u> odel (SimpDM for brevity), specifically tailored for tabular data imputation tasks. To mitigate sensitivity to noise, we introduce a self-supervised alignment mechanism that aims to regularize the model, ensuring consistent and stable imputation predictions. Furthermore, we introduce a carefully devised state-dependent data augmentation strategy within SimpDM, enhancing the robustness of the diffusion model when dealing with limited data. Extensive experiments demonstrate that SimpDM matches or outperforms state-of-the-art imputation methods across various scenarios.
With the widespread use of GPS devices and the advancement of location-based services, a vast amount of trajectory data has been collected and mined for various applications. Trajectory clustering, which categorizes trajectories into distinct groups, is the fundamental functionality of trajectory data mining. The challenge is how to cluster on a mass of trajectory data efficiently and universally with satisfying results. The raw trajectory clustering algorithms are universal, but trapped in the dilemma between efficiency and desirable results. Other approaches, such as density-based, road network-based, and deep learning-based algorithms, encounter issues like high time complexity, loss of trajectory integrity, reliance on road networks, and data quality during training. To tackle these challenges, we first propose the efficient KMCT (k-Means Clustering of Trajectories) algorithm based on a semantic interpolation transformation to cluster raw trajectories and achieve satisfying results. Additionally, we introduce the DA-KMCT (Density Accelerated k-Means Clustering of Trajectories) algorithm to further boost the clustering process based on trajectory densities and an optimized centroid selecting strategy. Moreover, we present a novel clustering evaluation method called IOD, which efficiently estimates clustering results on large-scale datasets with linear time complexity. Experimental results on real-world datasets demonstrate that KMCT and DA-KMCT outperform five related methods in terms of clustering quality and time efficiency, and the proposed IOD evaluation shows a strong correlation with the Silhouette Coefficient, offering a reliable and efficient alternative for evaluating clustering results.
The prediction of stock prices is a highly sought-after topic in the data mining field. In recent decades, many promising methods have been proposed and widely adopted for stock price prediction. However, these methods have inherent limitations, such as low accuracy, lack of transparency, and failure to consider the interactions among stock factors. To address these issues, we propose a UNIversal and interpretable framework for enhancing Stock Price Prediction (abbreviated to UniSPP), which is capable of modeling the interactions among stock factors. UniSPP first builds a fully connected graph, where the nodes and edges are the stock factors and interactions between them, respectively. However, it is a non-trivial task to discover a proper feature interaction subgraph from a large space, especially in discrete graph modeling. Therefore, UniSPP proposes a novel idea to mine the real factor interactions by iteratively sampling subgraphs and optimizing the sampling controller. Empirical studies show that our framework can be incorporated with many popular forecasting models and can effectively discover the suitable factor interaction, which can significantly improve the prediction results of existing models.
Next Set Recommendation (NSRec), encompassing related tasks such as next basket recommendation and temporal sets prediction, stands as a trending research topic. Although numerous attempts have been made on this topic, there are certain drawbacks: (i) Existing studies are still confined to utilizing objective functions commonly found in Next Item Recommendation (NIRec), such as binary cross entropy and BPR, which are calculated based on individual item comparisons; (ii) They place emphasis on building sophisticated learning models to capture intricate dependency relationships across sequential sets, but frequently overlook pivotal dependency in their objective functions; (iii) Diversity factor within sequential sets is frequently overlooked. In this research, we endeavor to unveil a universal and Sets-level optimization framework for Next Set Recommendation (SNSRec), offering a holistic fusion of diversity distribution and intricate dependency relationships within temporal sets. To realize this, the following contributions are made: (i) We directly model the temporal set in a sequence as a cohesive entity, leveraging the Structured Determinantal Point Process (SDPP), wherein the probabilistic DPP distribution prioritizes collections of structures (sequential sets) instead of individual items; (ii) We introduce a co-occurrence representation to discern and acknowledge the importance of different sets; (iii) We propose a sets-level optimization criterion, which integrates the diversity distribution and dependency relations across the entire sequence of sets, guiding the model to recommend relevant and diversified set. Extensive experiments on real-world datasets show that our approach consistently outperforms previous methods on both relevance and diversity.
Unsupervised anomaly detection in time series is essential in industrial applications, as it significantly reduces the need for manual intervention. Multivariate time series pose a complex challenge due to their feature and temporal dimensions. Traditional methods use Graph Neural Networks (GNNs) or Transformers to analyze spatial while RNNs to model temporal dependencies. These methods focus narrowly on one dimension or engage in coarse-grained feature extraction, which can be inadequate for large datasets characterized by intricate relationships and dynamic changes. This paper introduces a novel temporal model built on an enhanced Graph Attention Network (GAT) for multivariate time series anomaly detection called TopoGDN. Our model analyzes both time and feature dimensions from a fine-grained perspective. First, we introduce a multi-scale temporal convolution module to extract detailed temporal features. Additionally, we present an augmented GAT to manage complex inter-feature dependencies, which incorporates graph topology into node features across multiple scales, a versatile, plug-and-play enhancement that significantly boosts the performance of GAT. Our experimental results confirm that our approach surpasses the baseline models on four datasets, demonstrating its potential for widespread application in fields requiring robust anomaly detection. The code is available at https://github.com/ljj-cyber/TopoGDN.
Despite the success of conventional collaborative filtering (CF) approaches for recommendation systems, they exhibit limitations in leveraging semantic knowledge within the textual attributes of users and items. Recent focus on the application of large language models for recommendation (LLM4Rec) has highlighted their capability for effective semantic knowledge capture. However, these methods often overlook the collaborative signals in user behaviors. Some simply instruct-tune a language model, while others directly inject the embeddings of a CF-based model, lacking a synergistic fusion of different modalities. To address these issues, we propose a framework of Collaborative Cross-modal Fusion with Large Language Models, termed CCF-LLM, for recommendation. In this framework, we translate the user-item interactions into a hybrid prompt to encode both semantic knowledge and collaborative signals, and then employ an attentive cross-modal fusion strategy to effectively fuse latent embeddings of both modalities. Extensive experiments demonstrate that CCF-LLM outperforms existing methods by effectively utilizing semantic and collaborative signals in the LLM4Rec context.
The task of multi-behavioral sequential recommendation (MBSR) has grown in importance in personalized recommender systems, aiming to incorporate behavior types of interactions for better recommendations. Existing approaches focus on the next-item prediction objective, neglecting the value of integrating the target behavior type into the learning objective. In this paper, we propose MBGen, a novel Multi-Behavioral sequential Generative recommendation framework. We model the MBSR task into a consecutive two-step process: (1) given item sequences, MBGen first predicts the next behavior type to frame the user intention, (2) given item sequences and a target behavior type, MBGen then predicts the next items. To model such a two-step process, we tokenize both behaviors and items into tokens and construct one single token sequence with both behaviors and items placed interleaved. Furthermore, we design a unified generative recommendation paradigm that learns to autoregressive generate next behavior and item tokens, naturally enabling a multi-task capability. Additionally, we exploit the heterogeneous nature of token sequences in the generative recommendation and propose a position-routed sparse architecture to efficiently scale up models under the generative recommendation paradigm. Extensive experiments on real-world public datasets demonstrate that MBGen significantly outperforms existing MBSR models across multiple tasks.
Molecular property prediction stands as a cornerstone task in AI-driven drug design and discovery, wherein the atoms within a molecule serve as nodes, collectively forming a graph with bonds acting as edges. Given the crucial role of geometric structures in molecular property prediction, the integration of 3D information with various graph learning methods has been explored to enhance prediction performance. Despite the increasing adoption of the "Graph pre-training and fine-tuning" paradigm to refine molecular representations, a significant challenge persists due to the misalignment between pre-training objectives and downstream tasks. Drawing inspiration from prompt tuning techniques in Natural Language Processing (NLP), several graph prompt-based methods have emerged. However, existing approaches tend to overlook the unique properties inherent in molecular graphs. To address this gap, our paper introduces a novel approach named 3D <u>MO</u> lecul <u>A</u> rpromp <u>T</u> (MOAT) designed specifically for geometric molecules. Specifically, we propose atom-level prompts to capture atom distribution, geometry-level prompts tailored for molecular conformers, where different conformations have distinct chemical properties, and task-level prompts to leverage functional group properties. Results on both 3D and 2D downstream tasks demonstrate its ability to successfully bridge the data gap across diverse settings. To the best of our knowledge, this paper is the first attempt to introduce geometric graph-prompting learning for molecules.
Exploring the hierarchical structure of graphs presents notable advantages for graph analysis, revealing insights ranging from individual vertex behavior to community distribution and overall graph stability. This paper studies hierarchical structures within hypergraphs, where a hyperedge can connect multiple vertices. We observed that directly extending hierarchical frameworks from pairwise graphs to hypergraphs overlooks high-order interactions and can result in either high computational complexity or sparse hierarchy structure. To address this challenge, we introduce a dual-layer hypergraph hierarchy consisting of a primary hierarchy and a secondary hierarchy, enabling the construction of a refined hypergraph hierarchy in linear time. The dual-layer hierarchy establishes a global hierarchy based on vertex cohesion, utilizing vertex-induced subhypergraphs, and a local hierarchy based on hyperedge containment, employing edge-induced subhypergraphs. The combination of global and local hierarchy mitigates the homogeneity and sparsity issues inherent in single-layer hierarchies, allowing more effective modeling of high-order interactions. Furthermore, we propose an efficient hierarchical construction algorithm by leveraging a novel hyperedge-based disjoint set to identify connected subhypergraphs. Additionally, to optimize the local hierarchy further and prevent the emergence of excessively redundant levels, we introduce a compact local hierarchy by defining a restricted subgraph metric to eliminate redundancy caused by large-sized hyperedges. Empirical studies on real-world hypergraphs demonstrate the effectiveness of our approach.
Root cause analysis for faults is one of the core tasks in the operation and maintenance of communication networks. Although artificial intelligence techniques can be used to assist manual inspections, fault diagnosis is still a tough problem. In the scenario of fault root cause localization, insufficient associated information makes it difficult to accurately determine the root cause. Meanwhile, it is a big challenge to extract as many feature details as possible from limited information and then fully utilize them. Therefore, this paper proposes a Knowledge-Enhanced Transformer-FL method, namely, KETrans-FL, to address the problem of root cause localization, by treating it as a multi-class classification problem. Our method first constructs a knowledge graph for knowledge enhancement, which consists of four types of nodes (base station, alarm, fault and alarm level) and their relationships based on the network operations. This knowledge enhancement technique incorporates real operation data and other external knowledge (e.g., alarm level) to serve as the source of feature information and thus can extract statistical and embedded features. Then, our method designs a Trans-FL (Transformer-Focal Loss) model, which uses the adapted Transformer encoder to learn the correlation information between input features to generate classification probabilities and employs Focal Loss as the loss function to mitigate severe class imbalance in the multi-class classification problem. Experimental results show that our proposed KETrans-FL method achieves a classification accuracy of nearly 91% and an average AUC score of 93%, indicating a significant improvement on fault root cause localization compared to baseline models. In addition, experimental results also validate the remarkable effect of our knowledge enhancement technique on improving the final classification accuracy.
Generative Adversarial Networks (GANs), as a cornerstone of artificial intelligence (AI), are widely recognized as the intellectual property (IP) of their owners, given the sensitivity of the training data and the commercial value tied to the models. Model extraction attacks, which aim to steal well-trained proprietary models, pose a significant threat to model IP. Nevertheless, current research predominately focuses on the context of machine learning as a service (MLaaS), where the emphasis lies in understanding the attack knowledge acquired through black-box API queries. This restricted perspective exposes a critical gap in investigating model extraction attacks within realistic distributed settings for generative tasks. In this work, we present the first investigation into model extraction attacks against GANs in distributed settings. We provide a comprehensive attack taxonomy, considering three different levels of knowledge the adversary can obtain in practice. Based on it, we introduce a novel model extraction attack named MoEx, which focuses on the GAN-based distributed learning scenario, i.e., Multi-Discriminator GANs, a typical asymmetric distributed setting. MoEx uses the objective function simulation, leveraging data exchanged during the learning process, to approximate the GAN generator owned by the server. We define two attack goals for MoEx, fidelity extraction and accuracy extraction. Then we comprehensively evaluate the effectiveness of MoEx's two goals with real-world datasets. Our results demonstrate its robust capabilities in extracting generators with high fidelity and accuracy compared with existing methods.
A data void is a gap in online information, providing an opportunity for the spread of disinformation or a data void exploit. We introduce lightweight measures to track the progress of data void exploits and mitigation efforts in two contexts: Web search and Knowledge Graph (KG) querying. We use case studies to demonstrate the viability of these measures as data void trackers in the Web search context. To tackle data voids, we introduce an adversarial game model involving two agents: a disinformer and a mitigator. Both agents insert content into the information ecosystem to have their narrative rank higher than their counterpart in search results. At every turn, each agent chooses which content to deploy within their resource constraints, mimicking real-world situations where different entities have varying levels of influence and access to resources. Using simulations of this game, we compare and evaluate different mitigation strategies to recommend ones that maximize mitigation impact while minimizing costs.
Exposure bias is a well-known issue in recommender systems where items and suppliers are not equally represented in the recommendation results. This bias becomes particularly problematic over time as a few items are repeatedly over-represented in recommendation lists, leading to a feedback loop that further amplifies this bias. Although extensive research has addressed this issue in model-based or neighborhood-based recommendation algorithms, less attention has been paid to online recommendation models, such as those based on top-K contextual bandits, where recommendation models are dynamically updated with ongoing user feedback. In this paper, we study exposure bias in a class of well-known contextual bandit algorithms known as Linear Cascading Bandits,. We analyze these algorithms in their ability to handle exposure bias and provide a fair representation of items in the recommendation results. Our analysis reveals that these algorithms fail to mitigate exposure bias in the long run during the course of ongoing user interactions. We propose an Exposure-Aware reward model that updates the model parameters based on two factors: 1) implicit user feedback and 2) the position of the item in the recommendation list. The proposed model mitigates exposure bias by controlling the utility assigned to the items based on their exposure in the recommendation list. Our experiments with two real-world datasets show that our proposed reward model improves the exposure fairness of the linear cascading bandits over time while maintaining the recommendation accuracy. It also outperforms the current baselines. Finally, we prove a high probability upper regret bound for our proposed model, providing theoretical guarantees for its performance.
In this paper, we discuss the potential costs that emerge from using a Knowledge Graph (KG) in entity-oriented search without considering its data veracity. We argue for the need for KG veracity analysis to gain insights and propose a scalable assessment framework. Previous assessments focused on relevance, assuming correct KGs, and overlooking the potential risks of misinformation. Our approach strategically allocates annotation resources, optimizing utility and revealing the significant impact of veracity on entity search and card generation. Contributions include a fresh perspective on entity-oriented search extending beyond the conventional focus on relevance, a scalable assessment framework, exploratory experiments highlighting the impact of veracity on ranking and user experience, as well as outlining associated challenges and opportunities.
The article presents a simple yet optimal approach to compute aggregates of window queries over data streams. The proposal is built as a fully-fledged pipeline of operators that are literal transcripts of mathematical definitions. Main features are an application of the prefix sums and a well-founded un-/slicing technique. The overall process is linear in the number of events and windows, and it takes quasi-constant space for a mix of periodic windows, still applicable to multiple deterministic window queries. The limitations are twofold: the events come in-order and the aggregation function is a left-cancellative monoid.
Federated Class Incremental Learning (FCIL) is a new direction in continual learning (CL) for addressing catastrophic forgetting and non-IID data distribution simultaneously. Existing FCIL methods call for high communication costs and exemplars from previous classes. We propose a novel rehearsal-free method for FCIL named prototypes-injected prompt (PIP) that involves 3 main ideas: a) prototype injection on prompt learning, b) prototype augmentation, and c) weighted Gaussian aggregation on the server side. Our experiment result shows that the proposed method outperforms the current state of the arts (SOTAs) with a significant improvement (up to 33%) in CIFAR100, MiniImageNet, and TinyImageNet datasets. Our extensive analysis demonstrates the robustness of PIP in different task sizes, and the advantage of requiring smaller participating local clients, and smaller global rounds. For further study, source codes of PIP, baseline, and experimental logs are shared publicly in https://github.com/anwarmaxsum/PIP.
As machine learning (ML) models and datasets increase in complexity, the demand for methods that enhance explainability and interpretability becomes paramount. Prototypes, by encapsulating essential characteristics within data, offer insights that enable tactical decision-making and enhance transparency. Traditional prototype methods often rely on sub-symbolic raw data and opaque latent spaces, reducing explainability and increasing the risk of misinterpretations. This paper presents a novel framework that utilizes semantic descriptions to define prototypes and provide clear explanations, effectively addressing the shortcomings of conventional methods. Our approach leverages concept-based descriptions to cluster data on the semantic level, ensuring that prototypes not only represent underlying properties intuitively but are also straightforward to interpret. Our method simplifies the interpretative process and effectively bridges the gap between complex data structures and human cognitive processes, thereby enhancing transparency and fostering trust. Our approach outperforms existing widely-used prototype methods in facilitating human understanding and informativeness, as validated through a user survey.
Signed Graph Neural Networks (SGNNs) have recently gained attention as an effective tool for several learning tasks on signed networks, i.e., graphs where edges have an associated polarity. One of these tasks is to predict the polarity of the links for which this information is missing, starting from the network structure and the other available polarities. However, when the available polarities are few and potentially noisy, such a task becomes challenging.
In this work, we devise a semi-supervised learning framework that builds around the novel concept of multiscale social balance to improve the prediction of link polarities in settings characterized by limited data quantity and quality. Our model-agnostic approach can seamlessly integrate with any SGNN architecture, dynamically reweighting the importance of each data sample while making strategic use of the structural information from unlabeled edges combined with social balance theory.
Empirical validation demonstrates that our approach outperforms established baseline models, effectively addressing the limitations imposed by noisy and sparse data. This result underlines the benefits of incorporating multiscale social balance into SGNNs, opening new avenues for robust and accurate predictions in signed network analysis.
Conversational search supports multi-turn user-system interactions to solve complex information needs. Different from the traditional single-turn ad-hoc search, conversational search encounters a more challenging problem of context-dependent query understanding with the lengthy and long-tail conversational history context. While conversational query rewriting (CQR) methods leverage explicit rewritten queries to train a rewriting model to transform the context-dependent query into a stand-stone search query, this is usually done without considering the quality of search results. Conversational dense retrieval (CDR) methods use fine-tuning to improve a pre-trained ad-hoc query encoder, but they are limited by the conversational search data available for training. In this paper, we leverage both rewritten queries and relevance judgments in the conversational search data to train a better query representation model. The key idea is to align the query representation with those of rewritten queries and relevant documents. The proposed model -- Query Representation Alignment Conversational Dense Retriever, QRACDR, is tested on eight datasets, including various settings in conversational search and ad-hoc search. The results demonstrate the strong performance of QRACDR compared with other state-of-the-art methods, and confirm the effectiveness of representation alignment.
In the rapidly evolving landscape of online recipe sharing within a globalized context, there has been a notable surge in research towards comprehending and generating food recipes. Recent advancements in large language models (LLMs) like GPT-2 and LLaVA have paved the way for Natural Language Processing (NLP) approaches to delve deeper into various facets of food-related tasks, encompassing ingredient recognition and comprehensive recipe generation. Despite impressive performance and multi-modal adaptability of LLMs, domain-specific training remains paramount for their effective application. This work evaluates existing LLMs for recipe generation and proposes LLaVA-Chef, a novel model trained on a curated dataset of diverse recipe prompts in a multi-stage approach. First, we refine the mapping of visual food image embeddings to the language space. Second, we adapt LLaVA to the food domain by fine-tuning it on relevant recipe data. Third, we utilize diverse prompts to enhance the model's recipe comprehension. Finally, we improve the linguistic quality of generated recipes by penalizing the model with a custom loss function. LLaVA-Chef demonstrates impressive improvements over pretrained LLMs and prior works. A detailed qualitative analysis reveals that LLaVA-Chef generates more detailed recipes with precise ingredient mentions, compared to existing approaches.
Recent research in inductive reasoning has focused on predicting missing links between entities that are not observed during training. However, most approaches usually require that the relations are known at the inference time. In the real world, new entities and new relations usually emerge concurrently, which greatly challenges the model's generalization ability. In this paper, we propose a novel inductive knowledge graph embedding model that effectively handles unknown entities and relations by capturing their local structural features. Specifically, a relation graph is constructed to learn relation representations. In the relation graph, we employ a four-dimensional vector to represent the interaction patterns between nodes (relations), where each dimension corresponds to a specific type of interaction. For entity representations, our model dynamically initializes entity features using relation features and attentively aggregates neighboring features of entities to update entity features. By modeling interaction patterns between relations and incorporating structural information of entities, our model learns how to aggregate neighboring embeddings using attention mechanisms, thus generating high-quality embeddings for new entities and relations. Extensive experiments on benchmark datasets demonstrate that our model outperforms state-of-the-art methods, particularly in scenarios involving completely new relations.
Fake news detection plays a crucial role in protecting social media users and maintaining a healthy news ecosystem. Among existing works, comment-based fake news detection methods are empirically shown as promising because comments could reflect users' opinions, stances, and emotions and deepen models' understanding of fake news. Unfortunately, due to exposure bias and users' different willingness to comment, it is not easy to obtain diverse comments in reality, especially for early detection scenarios. Without obtaining the comments from the "silent'' users, the perceived opinions may be incomplete, subsequently affecting news veracity judgment. In this paper, we explore the possibility of finding an alternative source of comments to guarantee the availability of diverse comments, especially those from silent users. Specifically, we propose to adopt large language models (LLMs) as a user simulator and comment generator, and design GenFEND, a generated feedback-enhanced detection framework, which generates comments by prompting LLMs with diverse user profiles and aggregating generated comments from multiple subpopulation groups. Experiments demonstrate the effectiveness of GenFEND and further analysis shows that the generated comments cover more diverse users and could even be more effective than actual comments.
Identifying the regions of a learning resource that a learner pays attention to is crucial for assessing the material's impact and improving its design and related support systems. Saliency detection in videos addresses the automatic recognition of attention-drawing regions in single frames. In educational settings, the recognition of pertinent regions in a video's visual stream can enhance content accessibility and information retrieval tasks such as video segmentation, navigation, and summarization. Such advancements can pave the way for the development of advanced AI-assisted technologies that support learning with greater efficacy. However, this task becomes particularly challenging for educational videos due to the combination of unique characteristics such as text, voice, illustrations, animations, and more. To the best of our knowledge, there is currently no study that evaluates saliency detection approaches in educational videos. In this paper, we address this gap by evaluating four state-of-the-art saliency detection approaches for educational videos. We reproduce the original studies and explore the replication capabilities for general-purpose (non-educational) datasets. Then, we investigate the generalization capabilities of the models and evaluate their performance on educational videos. We conduct a comprehensive analysis to identify common failure scenarios and possible areas of improvement. Our experimental results show that educational videos remain a challenging context for generic video saliency detection models.
The Fair Graph Anomaly Detection (FairGAD) problem aims to accurately detect anomalous nodes in an input graph while avoiding biased predictions against individuals from sensitive subgroups. However, the current literature does not comprehensively discuss this problem, nor does it provide realistic datasets that encompass actual graph structures, anomaly labels, and sensitive attributes. To bridge this gap, we introduce a formal definition of the FairGAD problem and present two novel datasets constructed from the social media platforms Reddit and Twitter. These datasets comprise 1.2 million and 400,000 edges associated with 9,000 and 47,000 nodes, respectively, and leverage political leanings as sensitive attributes and misinformation spreaders as anomaly labels. We demonstrate that our FairGAD datasets significantly differ from the synthetic datasets used by the research community. Using our datasets, we investigate the performance-fairness trade-off in nine existing GAD and non- graph AD methods on five state-of-the-art fairness methods. Code and datasets are available at https://github.com/nigelnnk/FairGAD.
We study the problem of continual test-time adaption where the goal is to adapt a source pre-trained model to a sequence of unlabelled target domains at test time. Existing methods on test-time training suffer from several limitations: (1) Mismatch between the feature extractor and classifier; (2) Interference between the main and self-supervised tasks; (3) Lack of the ability to quickly adapt to the current distribution. In light of these challenges, we propose a cascading paradigm that simultaneously updates the feature extractor and classifier at test time, mitigating the mismatch between them and enabling long-term model adaptation. The pre-training of our model is structured within a meta-learning framework, thereby minimizing the interference between the main and self-supervised tasks and encouraging fast adaptation in the presence of limited unlabelled data. Additionally, we introduce innovative evaluation metrics, average accuracy and forward transfer, to effectively measure the model's adaptation capabilities in dynamic, real-world scenarios. Extensive experiments and ablation studies demonstrate the superiority of our approach in a range of tasks including image classification, text classification, and speech recognition. Our code is publicly available at https://github.com/Nyquixt/CascadeTTA.
Despite recent progress, large language models (LLMs) still face the challenge of appropriately reacting to the intricacies of social and cultural conventions. This paper presents Mango, a methodology for distilling high-accuracy, high-recall assertions of cultural knowledge. We judiciously and iteratively prompt LLMs for this purpose from two entry points, concepts and cultures. Outputs are consolidated via clustering and generative summarization. Running the Mango method with GPT-3.5 as underlying LLM yields 167K high-accuracy assertions for 30K concepts and 11K cultures, surpassing prior resources by a large margin in quality and size. In an extrinsic evaluation for intercultural dialogues, we explore augmenting dialogue systems with cultural knowledge assertions. Notably, despite LLMs inherently possessing cultural knowledge, we find that adding knowledge from Mango improves the overall quality, specificity, and cultural sensitivity of dialogue responses, as judged by human annotators. Data and code are available for download.
Recently, graph neural networks (GNNs) have demonstrated outstanding performance in fundamental tasks such as node classification and link prediction, as well as in specialized domains like recommendation systems, fraud detection, and drug discovery. However, their vulnerability to adversarial attacks raises concerns about their reliability in security-critical areas. To address this issue, researchers are exploring various defense methods, including specific attack countermeasures and certifiable robustness approaches. Nevertheless, these strategies are often effective only against limited attack scenarios, and prevailing certification methods prove inadequate when confronted with injection attacks. In this paper, we propose a method named CERT_UIA to enhance the robustness of GNN models against worst-case attacks, specifically targeting the scenario of <u>U</u>niversal node <u>I</u>njection <u>A</u>ttacks (UIA), thereby filling a gap in the existing literature on certified robustness in this context. Our approach involves a two-stage attack process that replaces the transformations of the topology and feature spaces with equivalent unified feature transformations, unifying the optimization of worst-case perturbations into a single feature space. Furthermore, we empirically evaluate our method on several benchmark datasets and compare it with existing certified methods.
The growth of online social networks (OSNs) has become increasingly significant. Potential cloned accounts on these platforms raise serious concerns due to the risks they pose to user privacy and security. Previous works in the detection of cloned accounts on OSNs do not yield satisfactory results and lack consideration of the impact of missing attributes on the detection process. We propose cloned account detection with imputation framework for online social networks (CADIF-OSN) to accurately find potential cloned accounts on OSNs. This framework enables the accurate identification of potential cloned accounts on OSNs by leveraging their public profile information, even in cases where some of the information may not be accessible. The framework comprises four key components: 1) Fuzzy string matching with Levenshtein Distance that quickly generates suspicious account pairs by matching all the accounts' usernames and screennames; 2) An embedded method Doc2Vec that transforms all existing profile information of accounts into estimable vectors; 3) A HyperImpute model that imputes the missing information; and 4) A deep-forest model that is trained to detect cloned accounts. We evaluated our framework using a Twitter dataset consisting of 3,826 pairs of cloned accounts and 70,000 normal accounts. The evaluation results demonstrate that our framework significantly surpasses existing approaches in terms of Precision and F1-score.
Text-aware recommender systems incorporate rich textual features, such as titles and descriptions, to generate item recommendations for users. The use of textual features helps mitigate cold-start problems, and thus, such recommender systems have attracted increased attention. However, we argue that the dependency on item descriptions makes the recommender system vulnerable to manipulation by adversarial sellers on e-commerce platforms. In this paper, we explore the possibility of such manipulation by proposing a new text rewriting framework to attack text-aware recommender systems. We show that the rewriting attack can be exploited by sellers to unfairly uprank their products, even though the adversarially rewritten descriptions are perceived as realistic by human evaluators. Methodologically, we investigate two different variations to carry out text rewriting attacks: (1) two-phase fine-tuning for greater attack performance, and (2) in-context learning for higher text rewriting quality. Experiments spanning 3 different datasets and 4 existing approaches demonstrate that recommender systems exhibit vulnerability against the proposed text rewriting attack. Our work adds to the existing literature around the robustness of recommender systems, while highlighting a new dimension of vulnerability in the age of large-scale automated text generation.
To obtain a foundational understanding of timeline algorithms and viral content in shaping public opinions, computer scientists started to study augmented versions of opinion formation models from sociology. In this paper, we generalize the popular Friedkin--Johnsen model to include the effects of external media sources on opinion formation. Our goal is to mathematically analyze the influence of biased media, arising from factors such as manipulated news reporting or the phenomenon of false balance. Within our framework, we examine the scenario of two opposing media sources, which do not adapt their opinions like ordinary nodes, and analyze the conditions and the number of periods required for radicalizing the opinions in the network. When both media sources possess equal influence, we theoretically characterize the final opinion configuration. In the special case where there is only a single media source present, we prove that media sources which do not adapt their opinions are significantly more powerful than those which do. Lastly, we conduct the experiments on real-world and synthetic datasets, showing that our theoretical guarantees closely align with experimental simulations.
Camera traps are important tools in animal ecology for biodiversity monitoring and conservation. However, their practical application is limited by issues such as poor generalization to new and unseen locations. Images are typically associated with diverse forms of context, which may exist in different modalities. In this work, we exploit the structured context linked to camera trap images to boost out-of-distribution generalization for species classification tasks in camera traps. For instance, a picture of a wild animal could be linked to details about the time and place it was captured, as well as structured biological knowledge about the animal species. While often overlooked by existing studies, incorporating such context offers several potential benefits for better image understanding, such as addressing data scarcity and enhancing generalization. However, effectively incorporating such heterogeneous context into the visual domain is a challenging problem. To address this, we propose a novel framework that transforms species classification as link prediction in a multimodal knowledge graph (KG). This framework enables the seamless integration of diverse multimodal contexts for visual recognition. We apply this framework for out-of-distribution species classification on the iWildCam2020-WILDS and Snapshot Mountain Zebra datasets and achieve competitive performance with state-of-the-art approaches. Furthermore, our framework enhances sample efficiency for recognizing under-represented species.
Text-Attributed Graphs (TAGs) are graphs of connected textual documents. Graph models can efficiently learn TAGs, but their training heavily relies on human-annotated labels, which are scarce or even unavailable in many applications. Large language models (LLMs) have recently demonstrated remarkable capabilities in few-shot and zero-shot TAG learning, but they suffer from scalability, cost, and privacy issues. Therefore, in this work, we focus on synergizing LLMs and graph models with their complementary strengths by distilling the power of LLMs into a local graph model on TAG learning. To address the inherent gaps between LLMs (generative models for texts) and graph models (discriminative models for graphs), we propose first to let LLMs teach an interpreter with rich rationale and then let a student model mimic the interpreter's reasoning without LLMs' rationale. We convert LLM's textual rationales to multi-level graph rationales to train the interpreter model and align the student model with the interpreter model based on the features of TAGs. Extensive experiments validate the efficacy of our proposed framework.
Hierarchical Reinforcement Learning (HRL) is specially designed for environments characterized by long-term goals and sparse rewards. High-level policies in HRL learn to generate appropriate subgoals aimed at accomplishing the final goal, while low-level policies focus on achieving these designated subgoals. Recently, graph-based HRL algorithms have demonstrated enhanced learning capabilities through the structural representation of state spaces as graphs. However, existing graph-based HRL methods still often generate inefficient subgoals. This paper introduces a new method, Novelty-aware Graph Traversal and Expansion (NGTE), which selects an optimal node at the graph boundary, termed an Outpost Subgoal, as a direct path toward the final goal. Once the Outpost Subgoal is reached, NGTE transitions into an exploration phase, offering exploration subgoals within a reachable distance to efficiently expand the graph. Demonstrated in complex environments such as quadruped robot navigation and robotic arm manipulation, NGTE consistently outperforms existing graph and non-graph HRL methods, showing outstanding performance, especially in the most challenging scenarios with fixed start and fixed goal conditions.
Drug-Target binding Affinity (DTA) prediction is essential for drug discovery. Despite the application of deep learning methods to DTA prediction, the achieved accuracy remain suboptimal. In this work, inspired by the recent success of retrieval methods, we propose kNN-DTA, a non-parametric embedding-based retrieval method adopted on a pre-trained DTA prediction model, which can extend the power of the DTA model with no or negligible cost. Different from existing methods, we introduce two neighbor aggregation ways from both embedding space and label space that are integrated into a unified framework. Specifically, we propose a label aggregation with pair-wise retrieval and a representation aggregation with point-wise retrieval of the nearest neighbors. This method executes in the inference phase and can efficiently boost the DTA prediction performance with no training cost. In addition, we propose an extension, Ada-kNN-DTA, an instance-wise and adaptive aggregation with lightweight learning. Results on four benchmark datasets show that kNN-DTA brings significant improvements, outperforming previous state-of-the-art (SOTA) results, e.g, on BindingDB IC50 and Ki testbeds, kNN-DTA obtains new records of RMSE 0.684 and 0.750 . The extended Ada-kNN-DTA further improves the performance to be 0.675 and 0.735 RMSE. These results strongly prove the effectiveness of our method. Results in other settings and comprehensive studies/analyses also show the great potential of our kNN-DTA approach.
The Visual Question Answering (VQA) task has recently become notorious because models are prone to predicting well-educated "guesses" as answers rather than deriving them through visual understanding. The main culprit for this is that VQA models memorize the shortcut biases in the dataset during the training process. While a variety of solutions have been proposed, they solely focus on the shortcuts in the language modality, leaving other kinds of shortcut biases untouched. In this paper, we shift our lens to all kinds of shortcuts and resort to causal inference to circumvent these issues. Causal inference methods can discover the causal effect (P(Y|do(X))) [27] rather than statistic-based spurious correlations (P(Y|X)) in the dataset, making them naturally suitable for debiasing learning. To deconfound these shortcut biases, we propose a causality-aware method, coined as Dual Causal Intervention (DCI), to endow VQA models with better generalization by combining two components: linguistic backdoor intervention and visual front-door intervention. To be specific, we harness backdoor intervention to cut off the effects of confounders in the language modality and employ front-door intervention to eliminate the impact of confounders in the visual modality. We conducted extensive experiments on two challenging Out-of-Distribution (OOD) benchmarks, including VQA-VS and VQA-CE, which are designed to assess the robustness of VQA models under different shortcut biases. The experimental results show the effectiveness of our method. Specifically, our approach outperforms the current state-of-the-art debiasing methods on the IID metric and all nine OOD metrics of the VQA-VS dataset, and also surpasses the performance of the best-performing methods on all metrics of the VQA-CE dataset.
The drastic performance degradation of Graph Neural Networks (GNNs) as the depth of the graph propagation layers exceeds 8-10 is widely attributed to a phenomenon of Over-smoothing. Although recent research suggests that Over-smoothing may not be the dominant reason for such a performance degradation, they have not provided rigorous analysis from a theoretical view, which warrants further investigation In this paper, we systematically analyze the real dominant problem in deep GNNs and identify the issues that these GNNs towards addressing Over-smoothing essentially work on via empirical experiments and theoretical gradient analysis. We theoretically prove that the difficult training problem of deep MLPs is actually the main challenge, and various existing methods that supposedly tackle Over-smoothing actually improve the trainability of MLPs, which is the main reason for their performance gains. Our further investigation into trainability issues reveals that properly constrained smaller upper bounds of gradient flow notably enhance the trainability of GNNs. Experimental results on diverse datasets demonstrate consistency between our theoretical findings and empirical evidence. Our analysis provides new insights in constructing deep graph models.
Cross-domain Aspect Sentiment Triplet Extraction (ASTE) aims to extract fine-grained sentiment elements from target domain sentences by leveraging the knowledge acquired from the source domain. Due to the absence of labeled data in the target domain, recent studies tend to rely on pre-trained language models to generate large amounts of synthetic data for training purposes. However, these approaches entail additional computational costs associated with the generation process. Different from them, we discover a striking resemblance between table-filling methods in ASTE and two-stage Object Detection (OD) in computer vision, which inspires us to revisit the cross-domain ASTE task and approach it from an OD standpoint. This allows the model to benefit from the OD extraction paradigm and region-level alignment. Building upon this premise, we propose a novel method named Table-Filling via Mean Teacher (TFMT). Specifically, the table-filling methods encode the sentence into a 2D table to detect word relations, while TFMT treats the table as a feature map and utilizes a region consistency to enhance the quality of those generated pseudo labels. Additionally, considering the existence of the domain gap, a cross-domain consistency based on Maximum Mean Discrepancy is designed to alleviate domain shift problems. Our method achieves state-of-the-art performance with minimal parameters and computational costs, making it a strong baseline for cross-domain ASTE.
Knowledge graph completion (KGC) aims to infer missing facts from existing facts. Learning logical rules plays a pivotal role in KGC, as logical rules excel in explaining why a missing fact is inferred. Most existing rule learning methods focus merely on learning chain-like rules, neglecting type constraints on entities. In practice, type constraints are crucial in expressing precise rules. Therefore, we propose a novel formalism for logical rules named TC-rules, which complements chain-like rules with both explicit and implicit type constraints on entity variables. Accordingly, we propose an end-to-end approach to effectively learn TC-rules, by parameterizing a neural model to simulate the inference of TC-rules. Considering that existing end-to-end methods learn two different sets of logical rules to respectively answer a head query (?,rnew, t) and a tail query (h,rrnew, ?), leading to confusing explanations for supporting a new fact (h,rnew, t), we propose a bi-directional learning mechanism to ensure that the TC-rules learnt for answering (?,rnew, t) are the same as the TC-rules learnt for answering (h,rnew, ?). Experimental results on eight benchmark datasets demonstrate that the proposed method outperforms state-of-the-art rule learners in both the link prediction task and the triple classification task. Furthermore, our case study confirms that expressive TC-rules can be extracted from the parameter assignment of the learnt neural model.
Multimodal Entity Linking (MEL) is a crucial task that aims at linking ambiguous mentions within multimodal contexts to the referent entities in a multimodal knowledge base, such as Wikipedia. Existing methods focus heavily on using complex mechanisms and extensive model tuning methods to model the multimodal interaction on specific datasets. However, these methods overcomplicate the MEL task and overlook the visual semantic information, which makes them costly and hard to scale. Moreover, these methods cannot solve the issues like textual ambiguity, redundancy, and noisy images, which severely degrade their performance. Fortunately, the advent of Large Language Models (LLMs) with robust capabilities in text understanding and reasoning, particularly Multimodal Large Language Models (MLLMs) that can process multimodal inputs, provides new insights into addressing this challenge. However, how to design a universally applicable LLMs-based MEL approach remains a pressing challenge. To this end, we propose UniMEL, a <u>uni</u>fied framework which establishes a new paradigm to process <u>m</u>ultimodal <u>e</u>ntity <u>l</u>inking tasks using LLMs. In this framework, we employ LLMs to augment the representation of mentions and entities individually by integrating textual and visual information and refining textual information. Subsequently, we employ the embedding-based method for retrieving and re-ranking candidate entities. Then, with only ~0.26% of the model parameters fine-tuned, LLMs can make the final selection from the candidate entities. Extensive experiments on three public benchmark datasets demonstrate that our solution achieves state-of-the-art performance, and ablation studies verify the effectiveness of all modules. Our code is available at https://github.com/Javkonline/UniMEL.
Hard-label black-box textual adversarial attack presents a challenging task where only the predictions of the victim model are available. Moreover, several constraints further complicate the task of launching such attacks, including the inherent discrete and non-differentiable nature of text data and the need to introduce subtle perturbations that remain imperceptible to humans while preserving semantic similarity. Despite the considerable research efforts dedicated to this problem, existing methods still suffer from several limitations. For example, algorithms based on complex heuristic searches necessitate extensive querying, rendering them computationally expensive. The introduction of continuous gradient strategies into discrete text spaces often leads to estimation errors. Meanwhile, geometry-based strategies are prone to falling into local optima. To address these limitations, in this paper, we introduce SGFL-Attack, a novel approach that leverages a <u>S</u>imilarity-<u>G</u>uidance strategy based on <u>F</u>eedback <u>L</u>earning for hard-label textual adversarial attack, with limited query budget. Specifically, the proposed SGFL-Attack utilizes word embedding vectors to assess the importance of words and positions in text sequences, and employs a feedback learning mechanism to determine reward or punishment based on changes in predicted labels caused by replacing words. In each iteration, SGFL-Attack guides the search based on knowledge acquired from the feedback learning mechanism, generating more similar samples while maintaining low perturbations. Moreover, to reduce the query budget, we incorporate local hash mapping to avoid redundant queries during the search process. Extensive experiments on seven widely used datasets show that the proposed SGFL-Attack method significantly outperforms state-of-the-art baselines and defenses over multiple language models.
Recently, integrating external tools with Large Language Models (LLMs) has gained significant attention as an effective strategy to mitigate the limitations inherent in their pre-training data. However, real-world systems often incorporate a wide array of tools, making it impractical to input all tools into LLMs due to length limitations and latency constraints. Therefore, to fully exploit the potential of tool-augmented LLMs, it is crucial to develop an effective tool retrieval system. Existing tool retrieval methods primarily focus on semantic matching between user queries and tool descriptions, frequently leading to the retrieval of redundant, similar tools. Consequently, these methods fail to provide a complete set of diverse tools necessary for addressing the multifaceted problems encountered by LLMs. In this paper, we propose a novel modelagnostic <u>CO</u> llaborative <u>L</u> earning-based <u>T</u> ool Retrieval approach, COLT, which captures not only the semantic similarities between user queries and tool descriptions but also takes into account the collaborative information of tools. Specifically, we first fine-tune the PLM-based retrieval models to capture the semantic relationships between queries and tools in the semantic learning stage. Subsequently, we construct three bipartite graphs among queries, scenes, and tools and introduce a dual-view graph collaborative learning framework to capture the intricate collaborative relationships among tools during the collaborative learning stage. Extensive experiments on both the open benchmark and the newly introduced ToolLens dataset show that COLT achieves superior performance. Notably, the performance of BERT-mini (11M) with our proposed model framework outperforms BERT-large (340M), which has 30 times more parameters. Furthermore, we will release ToolLens publicly to facilitate future research on tool retrieval.
Recommender systems typically represent users and items by learning their embeddings, which are usually set to uniform dimensions and dominate the model parameters. However, real-world recommender systems often operate in streaming recommendation scenarios, where the number of users and items continues to grow, leading to substantial storage resource consumption for these embeddings. Although a few methods attempt to mitigate this by employing embedding size search strategies to assign different embedding dimensions in streaming recommendations, they assume that the embedding size grows with the frequency of users/items, which eventually still exceeds the predefined memory budget over time. To address this issue, this paper proposes to learn Scalable Lightweight Embeddings for streaming recommendation, called SCALL, which can adaptively adjust the embedding sizes of users/items within a given memory budget over time. Specifically, we propose to sample embedding sizes from a probabilistic distribution, with the guarantee to meet any predefined memory budget. By fixing the memory budget, the proposed embedding size sampling strategy can increase and decrease the embedding sizes in accordance to the frequency of the corresponding users or items. Furthermore, we develop a reinforcement learning-based search paradigm that models each state with mean pooling to keep the length of the state vectors fixed, invariant to the changing number of users and items. As a result, the proposed method can provide embedding sizes to unseen users and items. Comprehensive empirical evaluations on two public datasets affirm the advantageous effectiveness of our proposed method.
Serverless computing offers resource efficiency, cost efficiency, and a "pay-as-you-go" pricing model, which makes it highly attractive to both users and cloud providers. However, serverless computing faces serious cold start problem, especially for deep neural network (DNN) inference, which requires low latency. Existing cold start optimization focuses only on quick container start and fast runtime and library loading. However, DNN application bootstrap (DNN framework load and start, model initialization, model download, deserialization and copy) is the leading factor during the overall cold start time. As the model size grows, the application-level bootstrap becomes more severe.
We present PISeL, a generic and fast application-level cold-start optimization mechanism for DNN inference. We propose a layer-grouping mechanism and policy to pipeline model download, model deserialization and copy and request execution. The grouping policy strikes a balance that minimizes both pipeline bubble risk and synchronization overhead. The pipelining process is transparent to a variety of DNN jobs and is implemented with the hook point in a lightweight manner. PISeL not only greatly reduces the cold start time, but also the peek memory usage which can easily incur OOM (out of memory) problem. Our experiments show that PISeL accelerates cold start time with all experimented system configurations and DNN models. PISeL can speed up cold start times by 37% and 63% using PyTorch framework executed on CPU and GPU and also 29% and 33% using TensorFlow framework executed on CPU and GPU. Furthermore, PISeL reduces maximum memory usage by up to 59% and 30% using PyTorch and TensorFlow frameworks.
Query refinement is to enhance the relevance of search results by modifying users' original queries to refined versions. State-of-the-art query refinement models have been trained on web query logs, which are predisposed to topic drifts. To fill the gap, little work has been proposed to generate benchmark datasets of (query ’ refined query) pairs through an overwhelming application of unsupervised or supervised modifications to the original query while controlling topic drifts. In this paper, however, we propose leveraging natural language backtranslation, a round-trip translation of a query from a source language via target languages, as a simple yet effective unsupervised approach to scale up generating gold-standard benchmark datasets. Backtranslation can (1) uncover terms that are omitted in a query for being commonly understood in a source language, but may not be known in a target language (e.g., 'figs'’(tamil) 'in a target language (e.g., ‘figs’→(tamil) ‘அத்திமரங்கள்’→‘the fig trees’), (2) augment a query with context-aware synonyms in a target language (e.g., ‘italian nobel prize winners’→(farsi) ’برنده های ایتالیایی جایزه نوبل‘ →‘italian nobel laureates’, and (3) help with the semantic disambiguation of polysemous terms and collocations (e.g., 'custer's last stand'’(malay)`pertahan terakhir custer'’`custer's last defence'. Our experiments across 5 query sets with different query lengths and topics and 10 languages from 7 language families using 2 neural machine translators validated the effectiveness of query backtranslation in generating a more extensive gold-standard dataset for query refinement. We open-sourced our research at https://github.com/fani-lab/RePair/tree/nqlb.
Time series prediction presents a significant challenge across various domains, such as transportation systems, environmental science, and multiple industrial sectors. Real-world time series data commonly exhibit periodic patterns and irregular sampling rates. Recent advancements in long sequence time series forecasting have made significant progress in adopting deep neural networks, particularly the Transformers, renowned for their robust representational capabilities. However, current Transformer-based models consider time steps as discrete tokens, thereby failing to account for periodicity and temporal intervals when selecting relevant time steps in the past. To address this limitation, we propose an end-to-end framework called Periormer for forecasting irregularly sampled time series. Periormer comprises three key components: (1) a novel input embedding layer that encodes the periodicity and time interval information, analogous to positional encoding in Transformers; (2) a feature-wise periodic attention mechanism that selects essential data points considering the periods and amplitudes of the periodic signals; and (3) a cross-feature periodic attention mechanism that identifies essential features relevant to the prediction. Experiments on four real-world datasets and one synthetic dataset demonstrate that Periormer reduces the mean squared error by 14.9% compared to state-of-the-art models.
Perceptual hashing algorithms have been used extensively to detect duplicate images, similar images for reverse image search, inappropriate and explicit images, and child sexual abuse (CSAM) images. These algorithms use various techniques to extract perceptual features from an image to create a succinct representation called the hash which is akin to the bio-metric marker of an image. This paper explores the ability of perceptual hashes to determine whether an image is a tampered version of a previously known image, i.e., it was created by object addition or removal from the original known image. In particular, a fast and efficient DCT-based perceptual hashing algorithm called SmartHash is proposed. SmartHash is an extension of a well-known and widely used pHash algorithm. It is evaluated on several publicly available datasets of tampered images and is shown to have high accuracy and precision in detecting such images. Additionally, an in-depth examination of the results is provided to quantify the limitations and hence, operating parameters of any software that integrates SmartHash in its workflow. Comparison with the state of the art Apple's NeuralHash and Microsoft's PhotoDNA is provided in the context of detecting tampered images. This work contributes to the content authentication initiative which is led by Adobe to establish content provenance and authentication across the Internet.
How can we mine frequent path regularities from a graph with edge labels and vertex attributes? The task of association rule mining successfully discovers regular patterns in item sets and substructures. Still, to our best knowledge, this concept has not yet been extended to path patterns in large property graphs. In this paper, we introduce the problem of path association rule mining (PARM). Applied to any reachability path between two vertices within a large graph, PARM discovers regular ways in which path patterns, identified by vertex attributes and edge labels, co-occur with each other. We develop an efficient and scalable algorithm PIONEER that exploits an anti-monotonicity property to effectively prune the search space. Further, we devise approximation techniques and employ parallelization to achieve scalable path association rule mining. Our experimental study using real-world graph data verifies the significance of path association rules and the efficiency of our solutions.
With the prevalence of social media platforms, accurately identifying the same users across different networks through network alignment has become crucial. Existing methods often struggle due to sparse or absent user-identifiable information (node attributes), highlighting the need for augmenting node attributes. However, research on attribute augmentation remains largely under-explored. In this study, we aim to design augmented attributes that enhance network alignment by reflecting three key structural <u> C </u>haracteristics: (C1) global structural characteristic, reflects the global network structure; (C2) seed-based structural characteristic, leverages cross-network structural information associated with seed nodes; (C3) multi-aspect structural characteristic, employs diverse structural relationship measures. To this end, we propose a novel approach for designing trustworthy Augmented Seed-baSed and multI-aspect STructurAl iNformaTion (ASSISTANT) attributes. To enhance alignment performance, we also present a learning module that utilizes a gate mechanism to select the most effective measure dynamically. Extensive experiments across various datasets demonstrate the following: 1) Our network alignment framework, which includes a gate mechanism module, significantly outperforms state-of-the-art methods in alignment accuracy; 2) other state-of-the-art methods using ASSISTANT attributes as input substantially boosts their own alignment accuracy; and 3) using only ASSISTANT attributes without any training process also leads to effective alignment, showcasing their high trustworthiness.
Automatic Chinese patent approval prediction is an emerging and valuable task in patent analysis. However, it involves a rigorous and transparent decision-making process that includes patent comparison and examination to assess its innovation and correctness. This resultant necessity of decision evidentiality, coupled with intricate patent comprehension presents significant challenges and obstacles for the patent analysis community. Consequently, few existing studies are addressing this task. This paper presents the pioneering effort on this task using a retrieval-based classification approach. We propose a novel framework called DiSPat, which focuses on structural representation learning and disentanglement to predict the approval of Chinese patents and offer decision-making evidence. DiSPat comprises three main components: base reference retrieval to retrieve the Top-k most similar patents as a reference base; structural patent representation to exploit the inherent claim hierarchy in patents for learning a structural patent representation; disentangled representation learning to learn disentangled patent representations that enable the establishment of an evidential decision-making process. To ensure a thorough evaluation, we have meticulously constructed three datasets of Chinese patents. Extensive experiments on these datasets unequivocally demonstrate our DiSPat surpasses state-of-the-art baselines on patent approval prediction, while also exhibiting enhanced evidentiality.
Human action recognition using commercial millimeter wave radar is gaining significant attention in smart elderly care and smart homes. Due to privacy concerns, the sensing data often needs to be processed locally on embedded systems with restricted computational resources, necessitating a balance between recognition accuracy and efficiency. In this paper, we propose a fast human action recognition framework based on 3D point cloud sequences generated by commercial 4D millimeter wave imaging radar systems. The framework comprises two primary phases: data preprocessing and spatial-temporal feature extraction. During the data preprocessing phase, we employ a sliding window approach for frame fusion to enhance the spatial information of the sparse point cloud while retaining its temporal features. Additionally, Morton coding is used to address the disorderliness in the point cloud sequence. For spatial-temporal feature extraction, we introduce an innovative two-stage algorithm. In the spatial feature extraction stage, we initially extract local spatial features for each point, utilizing self-attention to construct a local graph and circumvent the limitations of using Euclidean distance in sparse point clouds. Subsequently, 3D frame fusion convolution is applied to extract spatial features at the frame level, reducing the length of the spatial feature map sequence and lowering computational requirements for subsequent temporal feature extraction. In the temporal feature extraction stage, we employ a modified Transformer encoder with fine-grained feature fusion to extract temporal features. We conducted comprehensive experiments using both our collected dataset and the open dataset RadHar. The experimental outcomes demonstrate that our framework not only improves inference accuracy but also maintains satisfactory real-time performance on embedded platforms with constrained computational resources. When compared with state-of-the-art (SOTA) methods, our framework significantly enhances inference speed while retaining competitive inference accuracy. Codes and dataset are available at https://github.com/Feiyuyu0503/FastHAR.
Federated unlearning (FU) algorithms offer participants in federated learning (FL) the "right to be forgotten'' for their individual data and its impact on a collaboratively trained model. Existing FU algorithms primarily focus on accelerating the retraining process and enhancing the utility of the retrained models following data removal requests. However, these approaches generally lack consideration for the robustness of FU algorithms in potential adversarial environments, where adversaries can craft malicious data removal requests to compromise the retrained model. In this work, we introduce a robust federated unlearning framework (robustFU) which notably enhances the resilience of FU algorithms against a wide range of adversarial attacks. In robustFU, we design a novel dynamic conflict sample compensation algorithm that dynamically reintroduces randomly generated samples with significant information gain to the participating clients during retraining. Additionally, robustFU employs an innovative global reweighting mechanism which adjusts the weight of each model update during the global aggregation, based on its degree of misalignment with the trained model prior to unlearning. Extensive experiments demonstrates the effectiveness and robustness of the proposed robustFU framework under adversarial environments. Furthermore, robustFU significantly accelerates the retraining process, achieving a 2.53× speed-up compared to the retrain from the scratch baseline.
The relation extraction (RE) in complex scenarios faces challenges such as diverse relation types and ambiguous relations between entities within a single sentence, leading to the poor performance of pure "text-in, text-out" language models (LMs). To address these challenges, in this paper, we propose an agent-based RE framework, namely "AgentRE", which fully leverages the potential of large language models (LLMs) including memory, retrieval and reflection, to achieve RE in complex scenarios. Specifically, three major modules are built in AgentRE serving as the tools to help the agent acquire and process various useful information, thereby obtaining improved RE performance. Our extensive experimental results upon two datasets in English and Chinese demonstrate our AgentRE's superior performance, especially in low-resource scenarios. Additionally, the trajectories generated by AgentRE can be refined to construct a high-quality training dataset incorporating different reasoning methods, which can be used to fine-tune smaller models.
Large Language Models (LLMs) have shown proficiency in question-answering tasks but often struggle to integrate real-time knowledge, leading to potentially outdated or inaccurate responses. This problem becomes even more challenging when dealing with multi-hop questions, since they require LLMs to update and integrate multiple knowledge pieces relevant to the questions. To tackle the problem, we propose the Retrieval-Augmented model Editing (RAE) framework for multi-hop question answering. RAE first retrieves edited facts and then refines the language model through in-context learning. Specifically, our retrieval approach, based on mutual information maximization, leverages the reasoning abilities of LLMs to identify chain facts that traditional similarity-based searches might miss. In addition, our framework includes a pruning strategy to eliminate redundant information from the retrieved facts, which enhances the editing accuracy and mitigates the hallucination problem. Our framework is supported by theoretical justification for its fact retrieval efficacy. Finally, comprehensive evaluation across various LLMs validates RAE's ability in providing accurate answers with updated knowledge. Our code is available at: https://github.com/sycny/RAE.
Knowledge graphs have soared in popularity by supporting different types of applications and domains. In this context, the property graph data model has become an emerging standard in industry and academia. With its widespread use, there is also an increasing interest in investigating constraints for property graph data and their applications in data profiling. Graph Generating Dependencies (GGDs) are a class of property graph data dependencies that can express constraints on topology and properties of nodes and edges of the graph, making them a suitable candidate to expose an overview of the property graph to the user (profile graph data). However, GGDs can be difficult to set manually. To solve this issue, we propose a framework for discovering GGDs automatically from the property graph to profile graph data. Our framework has three main steps: (1) pre-processing, (2) candidate generation, and, (3) GGD extraction. Our results show that the discovered set of GGDs can give an overview of the input graph, including schema-level information between the graph patterns and attributes.
In the field of Multi-Person Pose Estimation (MPPE), Radio Frequency (RF)-based methods can operate effectively regardless of lighting conditions and obscured line-of-sight situations. Existing RF-based MPPE methods typically involve either 1) converting RF signals into heatmap images through complex preprocessing, or 2) applying a deep embedding network directly to raw RF signals. The first approach, while delivering decent performance, is computationally intensive and time-consuming. The second method, though simpler in preprocessing, results in lower MPPE accuracy and generalization performance. This paper proposes an efficient and lightweight one-stage MPPE model based on raw RF signals. By sub-grouping RF signals and embedding them using a shared single-layer CNN followed by multi-head attention, this model outperforms previous methods that embed all signals at once through a large and deep CNN. Additionally, we propose a new self-supervised learning (SSL) method that takes inputs from both one unmasked subgroup and the remaining masked subgroups to predict the latent representations of the masked data. Empirical results demonstrate that our model improves MPPE accuracy by up to 15 in PCKh@0.5 compared to previous methods using raw RF signals. Especially, the proposed SSL method has shown to significantly enhance performance improvements when placed in new locations or in front of obstacles at RF antennas, contributing to greater performance gains as the number of people increases. Our code and dataset is open at Github.
Individual personalities significantly influence our perceptions, decisions, and social interactions, which is particularly crucial for gaining insights into human behavior patterns in online social network analysis. Many psychological studies have observed that personalities are strongly reflected in their social behaviors and social environments. Unfortunately, psychological traits like one's personality are high-level and hidden in the innermost corner of data, which is intractable to be uncovered by traditional data mining approaches; The data quality of online social networks is far from sufficient to support such profound psychological analysis, because user behavior records and their attributes are usually very fragmented, missing lots of key information to understand a person in depth; In addition, the social environments in online networks are very complicated, making the interaction patterns between users and their environments underexplored.
In light of these problems, this paper proposes a sociological analysis framework for one's personality in an environment-based view instead of individual-level data mining. Specifically, to comprehensively understand an individual's behavior from low-quality records, we leverage the powerful associative ability of LLMs by designing an effective prompt. In this way, LLMs can integrate various scattered information with their external knowledge to generate higher-quality profiles, which can significantly improve the personality analysis performance. To explore the interactive mechanism behind the users and their online environments, we design an effective hypergraph neural network where the hypergraph nodes are users and the hyperedges in the hypergraph are social environments. We offer a useful dataset with user profile data, personality traits, and several detected environments from the real-world social platform. To the best of our knowledge, this is the first network-based dataset containing both hypergraph structure and social information, which could push forward future research in this area further. By employing the framework on this dataset, we can effectively capture the nuances of individual personalities and their online behaviors, leading to a deeper understanding of human interactions in the digital world. Our code and dataset are open accessible.
Relation extraction methods are currently dominated by deep neural models, which capture complex statistical patterns while being brittle and vulnerable to perturbations in data and distribution. Explainability techniques offer a means for understanding such vulnerabilities, and thus represent an opportunity to mitigate future errors; yet, existing methods are limited to describing what the model 'knows', while totally failing at explaining what the model does not know. This paper presents a new method for diagnosing model predictions and detecting potential inaccuracies. Our approach involves breaking down the problem into two components: (i) determining the necessary knowledge the model should possess for accurate prediction, through human annotations, and (ii) assessing the actual knowledge possessed by the model, using explainable AI methods (XAI). We apply our method to several relation extraction tasks and conduct an empirical study leveraging human specifications of what a model should know and does not know. Results show that human workers are capable of accurately specifying the model should-knows, despite variations in the specification, that the alignment between what a model really knows and what it should know is indeed indicative of model accuracy, and that the unknowns identified through our methods allow to foresee future errors that may or may not have been observed otherwise.
Federated learning (FL) is a distributed machine learning paradigm in which clients collaboratively train models in a privacy-preserving manner. While centralized FL (CFL) suffers from single points of failure and performance bottlenecks, decentralized FL (DFL), which depends on inter-client communication, has emerged to eliminate the need of a central entity. However, due to lack of the coordination of a central server, heterogeneous data distribution across clients makes local models in DFL inclined to diverge towards their local objectives, resulting in poor model accuracy. Moreover, each client in DFL needs to communicate with multiple neighbors, yielding a heavy communication load. To tackle these challenges, we propose a novel DFL framework called DFLStar, which can improve DFL from two perspectives. First, to avoid significantly diverging towards local data, DFLStar incorporates self-knowledge distillation to enhance the local model training by assimilating knowledge from the aggregated model. Second, clients in DFLStar identify and only select the most informative neighbors (based on the last layer model similarity) for parameter exchange, thereby minimizing the communication overhead. Our experimental results on two real datasets demonstrate that DFLStar significantly improves both communication overhead and training time compared to traditional DFL algorithms while achieving a specific target accuracy. Furthermore, within a fixed training duration, DFLStar constantly obtains the highest model accuracy compared to the baselines.
The Execute-Order-Validate (EOV) model of blockchain has significantly improved throughput compared to the Order-Execute (OE) model. However, existing systems following the EOV model still struggle to meet the required throughput levels in many applications. We address two critical performance bottlenecks that hinder their throughput: the high cost of transaction re-ordering and the high cost of re-execution of invalid transactions. To address these challenges, we propose HTFabric, a method that combines fast re-ordering and parallel re-execution to achieve exceptionally high successful throughput. We have implemented HTFabric based on Hyperledger Fabric. Through extensive experiments, we demonstrate that HTFabric outperforms SOTA systems by 2.34 to 10.51 times, achieving a successful throughput of up to 8,930 TPS.
Real-world quantitative reasoning problems are complex, often including extra information irrelevant to the question (or "IR noise" for short). State-of-the-art (SOTA) prompting methods have increased the Large Language Model's ability for quantitative reasoning on grade-school Math Word Problems (MWPs). To assess how well these SOTA methods handle IR noise, we constructed four new datasets with IR noise, each consisting of 300 problems from each of the four public datasets: MAWPS, ASDiv, SVAMP, and GSM8K, with added IR noise. We called the collection of these new datasets "MPN"--Math Word Problems with IR Noise. We evaluated SOTA prompting methods using MPN. We propose Noise Reduction Prompting (NRP) and its variant (NRP+) to reduce the impact of IR noise. Findings: Our IR noise significantly degrades the performance of Chain-of-Thought (CoT) Prompting on three different backend models: ChatGPT (gpt-3.5-turbo-0613), PaLM2, and Llama3-8B-instruct. Among them, ChatGPT offers the best accuracy on MPN with and without IR noise. With IR noise, performances of CoT, Least-To-Most Prompting, Progressive-Hint Prompting, and Program-aided Language Models with ChatGPT were significantly impacted, each with an average accuracy drop of above 12%. NRP is least impacted by the noise, with a drop in average accuracy to only around 1.9%. Our NRP+ and NRP perform comparably in the presence of IR noise.
The majority of GNNs are based on message-passing mechanisms. However, Message Passing Neural Networks (MPNNs) have inherent limitations in capturing long-range interactions. The exponentially growing node information is compressed into fixed-size representations through multiple rounds of message passing, leading to the over-squashing problem. This issue severely hinders the flow of information across the graph and creates a bottleneck in graph learning. The natural idea of introducing global attention to point-to-point communication, as adopted in Graph Transformers (GTs), lacks inductive biases on graph structures and relies on complex positional encodings to enhance their performance in practical tasks. In this paper, we observe that the sensitivity between nodes in MPNNs decreases exponentially with the shortest path distance. In contrast, GTs have constant sensitivity, which leads to a loss of inductive bias. To address these issues, we introduce structured state spaces to capture the hierarchy of rooted trees, achieving linear sensitivity with theoretical guarantees. We further propose a novel state-space model-based graph convolution, resulting in a new paradigm that retains both the strong inductive biases from MPNNs and the long-range modeling capabilities from GTs. Extensive experimental results on long-range and general graph benchmarks demonstrate the superiority of our approach.
To address the business needs of industrial recommendation systems, an increasing number of Multi-Domain Recommendation (MDR) methods are designed to improve recommendation performance on multiple domains simultaneously. Most MDR methods follow a multi-task learning paradigm, suffering from poor deployability and negative transfer. Due to the great success of large pre-trained models, the pre-train & fine-tune paradigm is attracting increasing attention. The latest methods introduce parameter-efficient fine-tuning techniques like prompt-tuning, showcasing high efficiency and effectiveness. However, these methods neglect the fundamental differences between recommendation and NLP tasks. The inadequate capacity of recommendation models restricts the effectiveness of prompts and adapters. Worse still, traditional natural domain division may group non-identically distributed samples into the same domain, violating the assumption of independent and identically distributed (i.i.d.) data. In this paper, we propose MultiLoRA, a Multi-directional Low Rank Adaptation paradigm for multi-domain recommendation. First we pre-train a universal model using all data samples. Then we conduct multiple domain divisions on the sample space. Under each division, we fine-tune the pre-trained model to obtain a set of domain-specific LoRAs. Finally, we learn a LoRA fusion module to integrate domain-specific preference patterns across multiple divisions. Experimental results on real-world datasets demonstrate notable advantages of MultiLoRA: (1) achieving SOTA performance, (2) showcasing remarkable compatibility, and (3) proving highly efficient, featuring only 2% trainable parameters compared to the backbone.
Bipartite graph models the relationship between two different sets of entities. Such graph data become more dynamic and are organized as stream with duplicate edges in real-word applications such as customer-product in e-commerce. A butterfly, (2,2)-biclique, is the simplest cohesive substructure and of great importance in a bipartite graph. However, it is challenging to estimate the number of butterflies in large scale and high dynamic bipartite graph stream when given a limited memory. Besides, existing works for butterfly counting assume no duplicate edges in the bipartite graph stream, which cause less accuracy in bipartite graph stream with duplicate edges. In this paper, we propose FABLE, a Fixed-size memory Approximate Butterfly counting algorithm for dupLicate Edges in bipartite graph stream. In FABLE, we compute the number of distinct edges by maintaining an ordered list of edge priorities for replacement and sampling. We provide theoretical proof of unbiasedness and derive the variance of butterfly count. Our extensive experiments on 5 real-world datasets confirm that our approach has higher accuracy compared with the baseline method under the same memory usage.
In educational data mining, concept prerequisite relations extraction determines which concepts need to be learned before learning another concept. It plays a crucial role in pedagogical practices, such as learning path planning and curriculum design. Deep neural networks, especially graph neural networks, have recently made significant strides in concept prerequisite relations extraction. However, existing methods face two primary limitations. (1) Methods with better performance construct heterogeneous complete graphs, leading to higher model complexity and training cost. Meanwhile, the performance of low-complexity methods is inferior to the former. (2) A disregard for temporal context, essential for learning, limits both the performance and the application of these methods. To address these issues, we propose a novel graph-based approach, called Learning-path based Concept Prerequisite Relations Extraction (LCPRE). LCPRE constructs a lightweight sparse graph in a simple manner, which reduces complexity from quadratic to linear and captures the temporal feature through learning-path, a comprehensible learning approach from one concept to another. Experimental results on three benchmark datasets demonstrate that LCPRE outperforms existing methods, establishing a new state-of-the-art in concept prerequisite relations extraction.
Recent advancements in Large Language Models (LLMs) have attracted considerable interest among researchers to leverage these models to enhance Recommender Systems (RSs). Existing work predominantly utilizes LLMs to generate knowledge-rich texts or utilizes LLM-derived embeddings as features to improve RSs. Although the extensive world knowledge embedded in LLMs generally benefits RSs, the application can only take a limited number of users and items as inputs, without adequately exploiting collaborative filtering information. Considering its crucial role in RSs, one key challenge in enhancing RSs with LLMs lies in providing better collaborative filtering information through LLMs. In this paper, drawing inspiration from the in-context learning and chain of thought reasoning in LLMs, we propose the Large Language Models enhanced Collaborative Filtering (LLM-CF) framework, which distills the world knowledge and reasoning capabilities of LLMs into collaborative filtering. We also explored a concise and efficient instruction-tuning method, which improves the recommendation capabilities of LLMs while preserving their general functionalities (e.g., not decreasing on the LLM benchmark). Comprehensive experiments on three real-world datasets demonstrate that LLM-CF significantly enhances several backbone recommendation models and consistently outperforms competitive baselines, showcasing its effectiveness in distilling the world knowledge and reasoning capabilities of LLM into collaborative filtering.
The increasing proliferation of misinformation and its alarming impact have motivated both industry and academia to develop approaches for misinformation detection and fact checking. Recent advances on large language models (LLMs) have shown remarkable performance in various tasks, but their potential in misinformation detection remains relatively underexplored. Most of existing state-of-the-art approaches either do not consider evidence and solely focus on claim related features or assume the evidence is provided. Few approaches consider evidence retrieval as part of the misinformation detection but rely on fine-tuning models. In this paper, we investigate the potential of LLMs for misinformation detection in a zero-shot setting. We incorporate an evidence retrieval component as it is crucial to gather pertinent information from various sources to detect the veracity of claims. To this end, we propose a novel re-ranking approach for multimodal evidence retrieval using both LLMs and large vision-language models (LVLM). The retrieved evidence samples (images and texts) serve as the input for an LVLM-based approach for multimodal fact verification (LVLM4FV). To enable a fair evaluation, we address the issue of incomplete ground truth in an existing evidence retrieval dataset by annotating a more complete set of evidence samples for both image and text retrieval. Our experimental results on two datasets demonstrate the superiority of the proposed approach in both evidence retrieval and fact verification tasks, with a better generalization capability.
Combinatorial medication recommendation (CMR) is a fundamental task of healthcare, which offers opportunities for clinical physicians to provide more precise prescriptions for patients with intricate health conditions, particularly in the scenarios of long-term medical care. Previous research efforts have sought to extract meaningful information from electronic health records (EHRs) to facilitate combinatorial medication recommendations. Existing learning-based approaches further consider the chemical structures of medications, but ignore the textual medication descriptions in which the functionalities are clearly described. Furthermore, the textual knowledge derived from the EHRs of patients remains largely underutilized. To address these issues, we introduce the Natural Language-Assisted Multi-modal Medication Recommendation (NLA-MMR), a multimodal alignment framework designed to learn knowledge from the patient view and medication view jointly. Specifically, NLA-MMR formulates CMR as an alignment problem from patient and medication modalities. In this vein, we employ pretrained language models (PLMs) to extract in-domain knowledge regarding patients and medications, serving as the foundational representation for both modalities. In the medication modality, we exploit both chemical structures and textual descriptions to create medication representations. In the patient modality, we generate the patient representations based on textual descriptions of diagnosis, procedure, and symptom. Extensive experiments conducted on three publicly accessible datasets demonstrate that NLA-MMR achieves new state-of-the-art performance, with a notable average improvement of 4.72% in Jaccard score.
Covariance matrix estimation is an important problem in statistics, with wide applications in finance, neuroscience, meteorology, oceanography, and other fields. However, when the data are high-dimensional and constantly generated and updated in a streaming fashion, the covariance matrix estimation faces huge challenges, including the curse of dimensionality and limited memory space. The existing methods either assume sparsity, ignoring any possible common factor among the variables, or obtain poor performance in recovering the covariance matrix directly from sketched data. To address these issues, we propose a novel method - KEEF: <u>K</u>nowledge-based Time and Memory <u>E</u>fficient Covariance <u>E</u>stimator in <u>F</u>actor Model and its extended variation. Our method leverages historical data to train a knowledge-based sketch matrix, which is used to accelerate the factor analysis of streaming data and directly estimates the covariance matrix from the sketched data. We provide theoretical guarantees, showing the advantages of our method in terms of time and space complexity, as well as accuracy. We conduct extensive experiments on synthetic and real-world data, comparing KEEF with several state-of-the-art methods, demonstrating the superior performance of our method.
Spatio-temporal prediction is a crucial research area in data-driven urban computing, with implications for transportation, public safety, and environmental monitoring. However, scalability and generalization challenges remain significant obstacles. Advanced models often rely on Graph Neural Networks to encode spatial and temporal correlations, but struggle with the increased complexity of large-scale datasets. The recursive GNN-based message passing schemes used in these models hinder their training and deployment in real-life urban sensing scenarios. Moreover, long-spanning large-scale spatio-temporal data introduce distribution shifts, necessitating improved generalization performance. To address these challenges, we propose a simple framework for spatio-temporal prediction - EasyST paradigm. It learns lightweight and robust Multi-Layer Perceptrons (MLPs) by effectively distilling knowledge from complex spatio-temporal GNNs. We ensure robust knowledge distillation by integrating the spatio-temporal information bottleneck with teacher-bounded regression loss, filtering out task-irrelevant noise and avoiding erroneous guidance. We further enhance the generalization ability of the student model by incorporating spatial and temporal prompts to provide downstream task contexts. Evaluation on three spatio-temporal datasets for urban computing tasks demonstrates that EasyST surpasses state-of-the-art approaches in terms of efficiency and accuracy. The implementation code is available at https://github.com/HKUDS/EasyST.
The drug recommendation task aims to predict safe and effective drug prescriptions based on the patients' historical electronic health records (EHRs). However, existing drug recommendation models generally have two limitations. First, they neglect the inherent characteristics of multiple views existing in patients' clinical data (e.g., diagnoses and procedures), leading to fragmented and inconsistent patient representations. Second, they do not fully exploit drug label information. Most models do not explicitly establish a mapping relationship between drug labels and patients' historical visits. To address these two problems, we proposed a label-aware multi-view drug recommendation model named LAMRec. In particular, LAMRec uses a cross-attention module to fuse information from the diagnosis and procedure views, and increases the mutual information of patient multi-view representations through multi-view contrastive loss; the label-wise attention mechanism fully explores drug label information by constructing an adaptive mapping of drug-visit to generate personalized representations that are aware of the drug-related visit information. Experiments on three real world medical datasets demonstrated the superiority of LAMRec, with a relative reduction of 5.25% in DDI compared to the optimal baseline, a relative improvement of 4.20% in Jaccard similarity scores, and a relative improvement of 3.10% in F1 scores. We released the code online at: https://github.com/Tyunsen/LAMRec.
Recently, the item textual information has been exploited with pre-trained language models (PLMs) to enrich the representations of tail items. The underlying idea is to align the hot items and tail items in terms of the external semantic knowledge covered by the PLM. However, it is non-trivial to eliminate the popularity bias by exploiting the textual semantics. One major obstacle is that the model supervision still counts on the sparse yet binary user behaviors. In the preliminary investigation, we discover that text-based recommendations also suffer from the popularity bias.
To this end, we propose a novel self-distillation framework based on a pre-trained language model, named Staple. The proposed Staple consists of two main components: ranker model and recommender model, which are both instantiated as a PLM towards exploiting the item textual semantics. Motivated by the recent success of reinforcement learning with human feedback (RLHF), the proposed Staple aims to recover the relative preference by learning a fair ranker model that can successfully distinguish the preference levels for uninteracted items. Specifically, analogous to the training of large language models (LLMs), we introduce a pre-training and a fair supervised fine-tuning with a decoupled layer to build the ranker model. Then, similar to RLHF for LLM training, we utilize the relative preference information estimated by the ranker over candidate items to complement the learning of the recommender model. We show that this RLHF process can be reformed as an efficient distillation learning process. We conduct extensive experiments on three real-world datasets. In addition to the performance metrics, we employ two additional metrics to measure fairness and debiased performance. The experiments show that our method can significantly improve the item exposure fairness of recommendation and mitigate popularity bias, while also improving the recommendation performance. The source code is available at https://github.com/WHUIR/STAPLE.
Deep learning for tabular data has garnered increasing attention in recent years, yet employing deep models for structured data remains challenging. While these models excel with unstructured data, their efficacy with structured data has been limited. Recent research has introduced retrieval-augmented models to address this gap, demonstrating promising results in supervised tasks such as classification and regression. In this work, we investigate using retrieval-augmented models for anomaly detection on tabular data. We propose a reconstruction-based approach in which a transformer model learns to reconstruct masked features ofnormal samples. We test the effectiveness of KNN-based and attention-based modules to select relevant samples to help in the reconstruction process of the target sample. Our experiments on a benchmark of 31 tabular datasets reveal that augmenting this reconstruction-based anomaly detection (AD) method with sample-sample dependencies via retrieval modules significantly boosts performance. The present work supports the idea that retrieval module are useful to augment any deep AD method to enhance anomaly detection on tabular data. Our code to reproduce the experiments is made available on GitHub.
Conversational assistants are becoming prevalent among the wider population due to their simplicity and increasing utility. However, the shortcomings of these tools are as renowned as their benefits. In this work, we present a "first look" at an extensive collection of conversational queries, aiming to identify limitations and improvement opportunities specifically related to information access (i.e., search interactions). We explore over 600,000 Google Assistant interactions from 173 unique users, examining usage trends and the resulting deficiencies and strengths of these assistants. We aim to provide a balanced assessment, highlighting the assistant's shortcomings in supporting users and delivering relevant information to user needs and areas where it demonstrates a reasonable response to user inputs. Our analysis shows that, although most users conduct information-seeking tasks, there is little evidence of complex information-seeking behaviour, with most interactions consisting of simple, imperative instructions. Finally, we find that conversational devices allow users to benefit from increased naturalistic interactions and the ability to apply acquired information in situ, a novel observation for conversational information seeking.
The progress of deep-learning-based forecasting architectures is evident through their expanding parameter configurations. However, the need for rapid online decision making in practical scenarios calls for an alternative strategy, highlighting the necessity for networks that are not only adaptive but also efficient in real-time operations. This shift is critical as we confront three principal challenges in deep-learning-based forecasting frameworks: (i) the inherent limitations of transformers, which, despite their attempts to preserve ordering information, the temporal information loss due to the permutation-invariant nature of self-attention mechanisms is inevitable, (ii) the inefficacy of linear models in capturing the dynamic interactions within swiftly evolving signals; and (iii) the incapacity of tree-based approaches to extrapolating beyond values present in the training set. In response to these challenges, we introduce LTBoost, an innovative boosted hybrid of linear and tree-based ensemble gradient algorithms tailored for long-term time series forecasting (LTSF) tasks, scalable to high data dimensions. LTBoost employs a dual strategy, beginning with a linear regression model to capture trends and extrapolate beyond known data, complemented by a robust nonlinear tree-based model that focuses on the residuals. This boosted hybrid approach not only addresses the challenges posed by existing models but also significantly improves forecast accuracy. The effectiveness of LTBoost is validated through empirical experiments conducted on nine well-established benchmark datasets, demonstrating superior performance and achieving state-of-the-art results in 32 out of 36 cases, measured by mean absolute error (MAE). Our findings also explore the impact of lag features and signal normalization techniques, demonstrating further improvements in predictive accuracy. This hybrid and highly effective approach highlights LTBoost's innovation and its resolution of specific forecasting challenges, setting the stage for its contribution to the field of time series forecasting, paving the way for its application in diverse real-world scenarios.
Despite their popularity, deep neural networks (DNNs) applied to time series forecasting often fail to beat simpler statistical models. One of the main causes of this suboptimal performance is the data non-stationarity present in many processes. In particular, changes in the mean and variance of the input data can disrupt the predictive capability of a DNN. In this paper, we first show how DNN forecasting models fail in simple non-stationary settings. We then introduce GAS-Norm, a novel methodology for adaptive time series normalization and forecasting based on the combination of a Generalized Autoregressive Score (GAS) model and a Deep Neural Network. The GAS approach encompasses a score-driven family of models that estimate the mean and variance at each new observation, providing updated statistics to normalize the input data of the deep model. The output of the DNN is eventually denormalized using the statistics forecasted by the GAS model, resulting in a hybrid approach that leverages the strengths of both statistical modeling and deep learning. The adaptive normalization improves the performance of the model in non-stationary settings. The proposed approach is model-agnostic and can be applied to any DNN forecasting model. To empirically validate our proposal, we first compare GAS-Norm with other state-of-the-art normalization methods. We then combine it with state-of-the-art DNN forecasting models and test them on real-world datasets from the Monash open-access forecasting repository. Results show that deep forecasting models improve their performance in 21 out of 25 settings when combined with GAS-Norm compared to other normalization methods.
Dual encoders are highly effective and widely deployed in the retrieval phase for passage and document ranking, question answering, or retrieval-augmented generation (RAG) setups. Most dual-encoder models use transformer models like BERT to map input queries and output targets to a common vector space encoding the semantic similarity. Despite their prevalence and impressive performance, little is known about the inner workings of dense encoders for retrieval. We investigate neural retrievers using the probing paradigm to identify well-understood IR properties that causally result in ranking performance. Unlike existing works that have probed cross-encoders to show query-document interactions, we provide a principled approach to probe dual-encoders. Importantly, we employ causal probing to avoid correlation effects that might be artefacts of vanilla probing. We conduct extensive experiments on one such dual encoder (TCT-ColBERT) to check for the existence and relevance of six properties: term importance, lexical matching (BM25), semantic matching, question classification, and the two linguistic properties of named entity recognition and coreference resolution. Our layer-wise analysis shows important differences between re-rankers and dual encoders, establishing which tasks are not only understood by the model but also used for inference.
Various social media platforms, e.g., Twitter and Reddit, allow people to disseminate a plethora of information more efficiently and conveniently. However, they are inevitably full of misinformation, causing damage to diverse aspects of our daily lives. To reduce the negative impact, timely identification of misinformation, namely Misinformation Detection (MD), has become an active research topic receiving widespread attention. As a complex phenomenon, the veracity of an article is influenced by various aspects. In this paper, we are inspired by the opposition of intents between misinformation and real information. Accordingly, we propose to reason the intent of articles and form the corresponding intent features to promote the veracity discrimination of article features. To achieve this, we build a hierarchy of a set of intents for both misinformation and real information by referring to the existing psychological theories, and we apply it to reason the intent of articles by progressively generating binary answers with an encoder-decoder structure. We form the corresponding intent features and integrate it with the token features to achieve more discriminative article features for MD. Upon these ideas, we suggest a novel MD method, namely Detecting Misinformation by Integrating Intent featuRes (DM-INTER). To evaluate the performance of DM-INTER, we conduct extensive experiments on benchmark MD datasets. The experimental results validate that DM-INTER can outperform the existing baseline MD methods.
Traditional recommender systems have primarily relied on identity representations (IDs) to model users and items. Recently, the integration of pre-trained language models (PLMs) has enhanced the capability to capture semantic descriptions of items. However, while PLMs excel in few-shot, zero-shot, and unified modeling scenarios, they often overlook the crucial signals from collaborative filtering (CF), resulting in suboptimal performance when sufficient training data is available. To effectively combine semantic representations with the CF signal and enhance recommender system performance in both warm and cold settings, two major challenges must be addressed: (1) bridging the gap between semantic and collaborative representation spaces, and (2) refining while preserving the integrity of semantic representations. In this paper, we introduce CARec, a novel model that adeptly integrates collaborative filtering signals with semantic representations, ensuring alignment within the semantic space while maintaining essential semantics. We present experimental results from four real-world datasets, which demonstrate significant improvements. By leveraging collaborative alignment, CARec also shows remarkable effectiveness in cold-start scenarios, achieving notable enhancements in recommendation performance. The code is available at https://github.com/ChenMetanoia/CARec **REMOVE 2nd URL**://github.com/ChenMetanoia/CARec.
Graph self-training (GST), which selects and assigns pseudo-labels to unlabeled nodes, is popular for tackling label sparsity in graphs. However, recent study on homophily graphs show that GST methods could introduce and amplify distribution shift between training and test nodes as they tend to assign pseudo-labels to nodes they are good at. As GNNs typically perform better on homophilic nodes, there could be potential shifts towards homophilic pseudo-nodes, which is underexplored. Our preliminary experiments on heterophilic graphs verify that these methods can cause shifts in homophily ratio distributions, leading to training bias that improves performance on homophilic nodes while degrading it on heterophilic ones. Therefore, we study a novel problem of reducing homophily ratio distribution shifts during self-training on heterophilic graphs. A key challenge is the accurate calculation of homophily ratios and their distributions without extensive labeled data. To tackle them, we propose a novel Heterophily-aware Distribution Consistency-based Graph Self-Training (HC-GST) framework, which estimates homophily ratios using soft labels and optimizes a selection vector to align pseudo-nodes with the global homophily ratio distribution. Extensive experiments on both homophilic and heterophilic graphs show that HC-GST effectively reduces training bias and enhances self-training performance.
Polymers are high-molecular-weight compounds constructed by the covalent bonding of numerous identical or similar monomers so that their 3D structures are complex yet exhibit unignorable regularity. Typically, the properties of a polymer, such as plasticity, conductivity, bio-compatibility, and so on, are highly correlated with its 3D structure. However, existing polymer property prediction methods heavily rely on the information learned from polymer SMILES sequences (P-SMILES strings) while ignoring crucial 3D structural information, resulting in sub-optimal performance. In this work, we propose MMPolymer, a novel multimodal multitask pretraining framework incorporating polymer 1D sequential and 3D structural information to encourage downstream polymer property prediction tasks. Besides, considering the scarcity of polymer 3D data, we further introduce the "Star Substitution" strategy to extract 3D structural information effectively. During pretraining, in addition to predicting masked tokens and recovering clear 3D coordinates, MMPolymer achieves the cross-modal alignment of latent representations. Then we further fine-tune the pretrained MMPolymer for downstream polymer property prediction tasks in the supervised learning paradigm. Experiments show that MMPolymer achieves state-of-the-art performance in downstream property prediction tasks. Moreover, given the pretrained MMPolymer, utilizing merely a single modality in the fine-tuning phase can also outperform existing methods, showcasing the exceptional capability of MMPolymer in polymer feature extraction and utilization.
To ensure AI safety, instruction-tuned Large Language Models (LLMs) are specifically trained to ensure alignment, which refers to making models behave in accordance with human intentions. While these models have demonstrated commendable results on various safety benchmarks, the vulnerability of their safety alignment has not been extensively studied. This is particularly troubling given the potential harm that LLMs can inflict. Existing attack methods on LLMs often rely on poisoned training data or the injection of malicious prompts. These approaches compromise the stealthiness and generalizability of the attacks, making them susceptible to detection. Additionally, these models often demand substantial computational resources for implementation, making them less practical for real-world applications. In this work, we study a different attack scenario, called Trojan Activation Attack (TA2), which injects trojan steering vectors into the activation layers of LLMs. These malicious steering vectors can be triggered at inference time to steer the models toward attacker-desired behaviors by manipulating their activations. Our experiment results on four primary alignment tasks show that TA2 is highly effective and adds little or no overhead to attack efficiency. Additionally, we discuss potential countermeasures against such activation attacks.
Personalized Session-based Recommendation (PSR) extends the traditional sequential recommendation models-which typically recommends the next item based on a recent active session-to leverage historical sessions of a user for short-term recommendations in current session. However, existing PSR methods face two limitations: (1) treating offline sessions uniformly as static data and relying on user embeddings to represent personalized information overlook the dynamic evolution of interests over time, which can change significantly as sessions progress in practical application. (2) focusing on accuracy, i.e., recommending items relevant to recent interactions, ignores the balance of multi-faceted requirements for user satisfaction, i.e., diversity, novelty, and serendipity.
Therefore, we introduce Multi-objective PSR (MOPSR) task and propose Hierarchical Decision Transformers (HDT) framework, which models strictly sequential preference transitions of users across and within sessions to balance recommendation accuracy with the mentioned objectives. To address the first problem, Inter-session DT dynamically tracks the user's long-term preference across sessions by maintaining a goal state. This goal state serves as personalized information to collaboratively make recommendations with short-term state via the Intra-session DT. To tackle the second limitation, we propose inter-session and intra-session unexpected returns to trade off relevant recommendations and user preferences on diversity, novelty, and serendipity. The hierarchical returns help the recommender accurately identify signals of the user's expectations and changes in multi-objective preferences. To verify the effectiveness of our method on the MOPSR, we apply HDT to four state-of-the-art sequential recommendation models and conduct experiments on two publicly available datasets. Experimental results demonstrate that (1) HDT can widely generalize sequential models to solve the MOPSR task in scenarios with incrementally generated sessions, and (2) our method can balance multi-objectives by maintaining and even enhancing accuracy while effectively improving the diversity, novelty, and serendipity objectives.
The surge in detecting fake news on social networks leads to increased research attention, particularly in the realm of deep learning models based on graph neural networks (GNNs). However, as research progresses, concerns emerge about the vulnerability of these detection models. In this study, we introduce an attack problem that perturbs user-news engagements by injecting bots to shield the targeted fake news from being detected by GNN-based fake news detection models. We propose a black-box attack method named Query-enhanced Surrogate-based Attack under Assortativity Constraint (QSA-AC) to work for this attack problem. QSA-AC combines surrogate-based and query-based approaches to improve attack effectiveness. At the same time, QSA-AC maintains a balance between attack effectiveness and imperceptibility by adjusting the local fluctuations of the assortativity with respect to the news on the social network. In addition, we introduce an evaluation metric, local strength assortativity perturbation rate (LSAPR), to assess the imperceptibility of the attack from the local perspective. Extensive experiments on two fake news datasets demonstrate that the proposed QSA-AC can achieve the optimal attack effectiveness, and control the trade-off between the attack effectiveness and imperceptibility.
It is widely acknowledged that extracting market sentiments from news data benefits market predictions. However, existing methods of using financial sentiments remain simplistic, relying on equal-weight and static aggregation to manage sentiments from multiple news items. This leads to a critical issue termed "Aggregated Sentiment Homogenization'', which has been explored through our analysis of a large financial news dataset from industry practice. This phenomenon occurs when aggregating numerous sentiments, causing representations to converge towards the mean values of sentiment distributions and thereby smoothing out unique and important information. Consequently, the aggregated sentiment representations lose much predictive value of news data. To address this problem, we introduce the Market Attention-weighted News Aggregation Network (MANA-Net), a novel method that leverages a dynamic market-news attention mechanism to aggregate news sentiments for market prediction. MANA-Net learns the relevance of news sentiments to price changes and assigns varying weights to individual news items. By integrating the news aggregation step into the networks for market prediction, MANA-Net allows for trainable sentiment representations that are optimized directly for prediction. We evaluate MANA-Net using the S&P 500 and NASDAQ 100 indices, along with financial news spanning from 2003 to 2018. Experimental results demonstrate that MANA-Net outperforms various recent market prediction methods, enhancing Profit & Loss by 1.1% and the daily Sharpe ratio by 0.252.
In Reinforcement Learning-based Recommender Systems (RLRS), the complexity and dynamism of user interactions often result in high-dimensional and noisy state spaces, making it challenging to discern which aspects of the state are truly influential in driving the decision-making process. This issue is exacerbated by the evolving nature of user preferences and behaviors, requiring the recommender system to adaptively focus on the most relevant information for decision-making while preserving generaliability. To tackle this problem, we introduce an innovative causal approach for decomposing the state and extracting Causal-InDispensable State Representations (CIDS) in RLRS. Our method concentrates on identifying the Directly Action-Influenced State Variables (DAIS) and Action-Influence Ancestors (AIA), which are essential for making effective recommendations. By leveraging conditional mutual information, we develop a framework that not only discerns the causal relationships within the generative process but also isolates critical state variables from the typically dense and high-dimensional state representations. We provide theoretical evidence for the identifiability of these variables. Then, by making use of the identified causal relationship, we construct causal-indispensable state representations, enabling the training of policies over a more advantageous subset of the agent's state space. We demonstrate the efficacy of our approach through extensive experiments, showcasing our method outperforms state-of-the-art methods.
Utilizing powerful Large Language Models (LLMs) for generative recommendation has attracted much attention. Nevertheless, a crucial challenge is transforming recommendation data into the language space of LLMs through effective item tokenization. Current approaches, such as ID, textual, and codebook-based identifiers, exhibit shortcomings in encoding semantic information, incorporating collaborative signals, or handling code assignment bias. To address these limitations, we propose LETTER (a LEarnable Tokenizer for generaTivE Recommendation), which integrates hierarchical semantics, collaborative signals, and code assignment diversity to satisfy the essential requirements of identifiers. LETTER incorporates Residual Quantized VAE for semantic regularization, a contrastive alignment loss for collaborative regularization, and a diversity loss to mitigate code assignment bias. We instantiate LETTER on two models and propose a ranking-guided generation loss to augment their ranking ability theoretically. Experiments on three datasets validate the superiority of LETTER, advancing the state-of-the-art in the field of LLM-based generative recommendation.
Deep neural networks (DNNs) are known to be vulnerable to adversarial examples. To facilitate model safety, transfer-based attacks employ surrogate models to craft adversarial examples. In this work, we firstly study the intricate mechanisms of such attacks. We observe a correlation between the sharpness of decision boundaries in model sensitive regions and overfitting during adversarial training, which hampers the adversarial examples' transferability. To address this issue, we propose a novel approach termed Frequency-Guided Sample Relevance Attack (FGSRA). Specifically, we leverage frequency information to explore similar sensitive regions across different models, thereby generating neighborhood samples. Additional similarity weights are subsequently introduced to assess the adversarial contribution of the neighborhood samples. A hybrid gradient is then obtained to thoroughly exploit neighborhood information within input samples. Extensive experiments demonstrate the prominent performance of our approach. Compared to other state-of-the-art benchmarks on surrogate model Inc-v3, our method has an average improvement of 27.21% for normally trained CNNs and 42.1% for adversarially trained CNNs. Moreover, we achieve an average improvement of 24.6% for ViTs. Our code is available at:https://github.com/LMBTough/FGSRA
Generative models have emerged as a promising utility to enhance recommender systems. It is essential to model both item content and user-item collaborative interactions in a unified generative framework for better recommendation. Although some existing large language model (LLM)-based methods contribute to fusing content information and collaborative signals, they fundamentally rely on textual language generation, which is not fully aligned with the recommendation task. How to integrate content knowledge and collaborative interaction signals in a generative framework tailored for item recommendation is still an open research challenge.
In this paper, we propose <u>co</u> ntent-based col <u>la</u> borative generation for <u>rec</u> ommender systems, namely ColaRec. ColaRec is a sequence-to-sequence framework which is tailored for directly generating the recommended item identifier. Precisely, the input sequence comprises data pertaining to the user's interacted items, and the output sequence represents the generative identifier (GID) for the suggested item. To model collaborative signals, the GIDs are constructed from a pretrained collaborative filtering model, and the user is represented as the content aggregation of interacted items. To this end, ColaRec captures both collaborative signals and content information in a unified framework. Then an item indexing task is proposed to conduct the alignment between the content-based semantic space and the interaction-based collaborative space. Besides, a contrastive loss is further introduced to ensure that items with similar collaborative GIDs have similar content representations. To verify the effectiveness of ColaRec, we conduct experiments on four benchmark datasets. Empirical results demonstrate the superior performance of ColaRec.
Drug combinations can cause adverse drug-drug interactions(DDIs). Identifying specific effects is crucial for developing safer therapies. Previous works on DDI event prediction have typically been limited to using labels of specific events as supervision, which renders them insufficient to address two significant challenges: (1) the bias caused by highly imbalanced event distribution where certain interaction types are vastly underrepresented. (2) the scarcity of labeled data for rare events, a pervasive issue where rare yet potentially critical interactions are often overlooked or under-explored due to limited available data. In response, we offer "DDIPrompt", an innovative solution inspired by the recent advancements in graph prompt learning. Our framework aims to address these issues by leveraging the intrinsic knowledge from pre-trained models, which can be efficiently deployed with minimal downstream data. Specifically, to solve the first challenge, DDIPrompt features a hierarchical pre-training strategy to foster a generalized and comprehensive understanding of drug properties. It captures intra-molecular structures through augmented links based on structural proximity between drugs, further learns inter-molecular interactions emphasizing edge connections rather than concrete catagories. For the second challenge, we implement a prototype-enhanced prompting mechanism during inference. This mechanism, refined by few-shot examples from each category, effectively harnesses the rich pre-training knowledge to enhance prediction accuracy, particularly for these rare but crucial interactions. Comprehensive evaluations on two benchmark datasets demonstrate DDIPrompt's SOTA performance, especially for those rare DDI events.
Retrieval-augmented Generation has been used to augment Language Models by retrieving texts from external databases. Since real-world texts are often connected in the graph (e.g., papers in citation networks), we use these relations to guide the retrieval process of RAG. Concretely, we investigate proximity and role-based relations, where the former considers topologically close nodes and the latter considers structurally similar nodes. We empirically verify their correlation to text relations, which motivates us to propose the framework of Topology-aware Retrieval-augmented Generation for text generation, which consists of a retrieval module to retrieve texts by their topological relations and an aggregation module to compose retrieved texts into prompts triggering LLMs for text generation. Extensive experiments verify the effectiveness of this framework, signifying the potential of equipping RAG with topological awareness.
The topology of diffusion networks plays an essential role in understanding information propagation dynamics and conducting social network analysis. However, diffusion networks are often unobservable in practical applications, leading to wide research on network inference from information cascades over the past decade. At present, novel cascades-based methods have been further developed to recover temporal dynamics and network topology by exploring the utilization of node temporal information, resulting in notable advancements. However, it requires high costs to acquire extensive temporal information, and the performance of network inference may decrease due to potential observational errors. Therefore, this paper specifically focuses on the time-independent scenario to address these limitations. Firstly, this paper models the node statuses of each diffusion process by leveraging the assumption of propagation trees based on the well-known independent cascade model. Subsequently, a gradient-based approach is developed to estimate the influences between nodes, facilitating the inference of network structure. Furthermore, this paper proposes a Monte Carlo EM-based approach to enhance the efficiency of network inference while maintaining comparable accuracy. Extensive experiments are conducted to verify the efficiency and effectiveness of our approaches on both synthetic and real-world networks.
Many platforms, such as e-commerce websites, offer both search and recommendation services simultaneously to better meet users' diverse needs. Recommendation services suggest items based on user preferences, while search services allow users to search for items before providing recommendations. Since users and items are often shared between the search and recommendation domains, there is a valuable opportunity to enhance the recommendation domain by leveraging user preferences extracted from the search domain. Existing approaches either overlook the shift in user intention between these domains or fail to capture the significant impact of learning from users' search queries on understanding their interests.
In this paper, we propose a framework that learns from user search query embeddings within the context of user preferences in the recommendation domain. Specifically, user search query sequences from the search domain are used to predict the items users will click at the next time point in the recommendation domain. Additionally, the relationship between queries and items is explored through contrastive learning. To address issues of data sparsity, the diffusion model is incorporated to infer positive items the user will select after searching with certain queries in a denoising manner, which is particularly effective in preventing false positives. Effectively extracting this information, the queries are integrated into click-through rate prediction in the recommendation domain. Experimental analysis demonstrates that our model outperforms state-of-the-art models in the recommendation domain.
As the demand for more personalized recommendation grows and a dramatic boom in commercial scenarios arises, the study on multi-scenario recommendation (MSR) has attracted much attention, which uses the data from all scenarios to simultaneously improve their recommendation performance. However, existing methods tend to integrate insufficient scenario knowledge and neglect learning personalized cross-scenario preferences, thus leading to sub-optimal performance. Meanwhile, though large language model (LLM) has shown great capability of reasoning and capturing semantic information, the high inference latency and high computation cost of tuning hinder its implementation in industrial recommender systems. To fill these gaps, we propose an LLM-enhanced paradigm LLM4MSR in this work. Specifically, we first leverage LLM to uncover multi-level knowledge from the designed scenario- and user-level prompt without fine-tuning the LLM, then adopt hierarchical meta networks to generate multi-level meta layers to explicitly improve the scenario-aware and personalized recommendation capability. Our experiments on KuaiSAR-small, KuaiSAR, and Amazon datasets validate significant advantages of LLM4MSR: (i) the effectiveness and compatibility with different multi-scenario backbone models, (ii) high efficiency and deployability on industrial recommender systems, and (iii) improved interpretability. The implemented code and data is available to ease reproduction.
Graph generation plays a vital role in a wide range of applications such as traffic analysis, drug discovery and more, for its rapid and efficient generation speed coupled with its precise and stable generation capabilities. And the diffusion model, which is the dominant solution in image generation domain, has shown its potential in graph generation recently. In the literature, existing graph diffusion models trivialize the graph structure, and prioritize the Euclidean space for graph generation, ignoring the intrinsic difference between non-Euclidean graph structures and Euclidean grid-like image/text data. The few graph diffusion models in hyperbolic space separate the embedding and diffusion process, and loose the geometric constraints in the diffusion process. The problem of generating graph structure in a generic Riemannian space largely remains open. It faces several fundamental challenges. On the one hand, navigating and structuring graphs within Riemannian spaces poses greater difficulty. In other words, how to preserve adherence to the constraints of Riemannian geometry has not been touched in the literature. On the other hand, Riemannian operators for graph diffusion models are not available so far, which is inherently different from that in Euclidean space. In light of the aforementioned issues, we restore the notion of product space, and propose a generic graph generation method, called mixed-curvature Product Space Graph Diffusion Model (ProGDM). Specifically, ProGDM includes a Riemannian embedding module based on contrastive learning and a geometric diffusion models across multiple Riemannian sub-spaces. We evaluate the proposed ProGDM with extensive experiments on benchmark datasets, and the empirical results show that ProGDM has achieved superior performance to the state-of-the-art methods.
Contrastive Learning (CL) enhances the training of sequential recommendation (SR) models through informative self-supervision signals. Existing methods often rely on data augmentation strategies to create positive samples and promote representation invariance. Some strategies such as item reordering and item substitution may inadvertently alter user intent. Supervised Contrastive Learning (SCL) based methods find an alternative to augmentation-based CL methods by selecting same-target sequences (interaction sequences with the same target item) to form positive samples. However, SCL-based methods suffer from the scarcity of same-target sequences and consequently lack enough signals for contrastive learning. In this work, we propose to use similar sequences (with different target items) as additional positive samples and introduce a Relative Contrastive Learning (RCL) framework for sequential recommendation. RCL comprises a dual-tiered positive sample selection module and a relative contrastive learning module. The former module selects same-target sequences as strong positive samples and selects similar sequences as weak positive samples. The latter module employs a weighted relative contrastive loss, ensuring that each sequence is represented closer to its strong positive samples than its weak positive samples. We apply RCL on two mainstream deep learning-based SR models, and our empirical results reveal that RCL can achieve 4.88% improvement averagely than the state-of-the-art SR methods on five public datasets and one private dataset. The code can be found at https://github.com/Cloudcatcher888/RCL.
Recognizing implicit discourse relations between texts is challenging due to the absence of explicit connectives. Encoding texts into distinguishable semantic representations is beneficial to connective-free relation determination. To enable encoders to produce clearly distinguishable representations, we propose a joint learning framework. It combines prototypical and adversarial learning as well as hub-migration based redistribution. We experiment on PDTB 2.0, PDTB 3.0 and the CoNLL-2016 shared benchmark dataset. Experimental results show that our methods yield substantial improvements, compared to BERT, RoBERTa and DeBERTa baselines. In the separate comparison experiment where implicit connectives are disable during training, our models outperform the previous work for the four main discourse relation types (Temporality, Comparison, Contingency and Expansion), achieving F1-scores of about 60.75%, 63.48%, 75.70% and 77.09%. On the other hand, our models obtain comparable performance compared to part of the state-of-the-art models which adopt implicit connectives as observable hints for training. When the pretrained language models are used as the backbones, our methods yield the extra time consumption, which is about 1 hour at worst when training is conducted on Tesla 40GB A100 GPU.
Anomaly detection is a crucial data mining problem due to its extensive range of applications. In real-world scenarios, anomalies often exhibit different levels of priority. Unfortunately, existing methods tend to overlook this phenomenon and identify all types of anomalies into a single class. In this paper, we propose a generalized formulation of the anomaly detection problem, which covers not only the conventional anomaly detection task, but also the partial anomaly detection task that is focused on identifying target anomalies of primary interest while intentionally disregarding non-target (low-risk) anomalies. One of the challenges in addressing this problem is the overlap among normal instances and anomalies of different levels of priority, which may cause high false positive rates. Additionally, acquiring a sufficient quantity of all types of labeled non-target anomalies is not always feasible. For this purpose, we present a generalized anomaly detection framework flexible in addressing a broader range of anomaly detection scenarios. Employing a dual-center mechanism to handle relationships among normal instances, non-target anomalies, and target anomalies, the proposed framework significantly reduces the number of false positives caused by class overlap and tackles the challenge of limited amount of labeled data. Extensive experiments conducted on two publicly available datasets from different domains demonstrate the effectiveness, robustness and superior labeled data utilization of the proposed framework. When applied to a real-world application, it exhibits a lift of at least 7.08% in AUPRC compared to the alternatives, showcasing its remarkable practicality.
Customer Lifetime Value (CLTV) prediction is a critical task in business applications, such as customer relationship management (CRM), online marketing, etc. Accurately predicting CLTV is challenging in real-world business scenarios, as the distribution of CLTV is complex and mutable. Firstly, there is a large number of users without any consumption consisting of a long-tailed part that is too complex to fit. Secondly, the small set of high-value users spent orders of magnitude more than a typical user leading to a wide range of the CLTV distribution which is hard to capture in a single distribution. Existing approaches for CLTV estimation either assume a prior probability distribution and fit a single group of distribution-related parameters for all samples, or directly learn from the posterior distribution with manually predefined buckets in a heuristic manner. However, all these methods fail to handle complex and mutable distributions. In this paper, we propose a novel optimal distribution selection model (OptDist) for CLTV prediction, which utilizes an adaptive optimal sub-distribution selection mechanism to improve the accuracy of complex distribution modeling. Specifically, OptDist trains several candidate sub-distribution networks in the distribution learning module (DLM) for modeling the probability distribution of CLTV. Then, a distribution selection module (DSM) is proposed to select the sub-distribution for each sample, thus making the selection automatically and adaptively. Besides, we design an alignment mechanism that connects both modules, which effectively guides the optimization. We conduct extensive experiments on both two public and one private dataset to verify that OptDist outperforms state-of-the-art baselines. Furthermore, OptDist has been deployed on a large-scale financial platform for customer acquisition marketing campaigns and the online experiments also demonstrate the effectiveness of OptDist.
Continuous-time dynamics models, e.g., neural ordinary differential equations, enable accurate modeling of underlying dynamics in time-series data. However, employing neural networks for parameterizing dynamics makes it challenging for humans to identify dependence structures, especially in the presence of delayed effects. In consequence, these models are not an attractive option when capturing dependence carries more importance than accurate modeling, e.g., in tsunami forecasting.
In this paper, we present a novel method for identifying dependence structures in continuous-time dynamics models. We take a two-step approach: (1) During training, we promote weight sparsity in the model's first layer during training. (2) We prune the sparse weights after training to identify dependence structures. In evaluation, we test our method in scenarios where the exact dependence-structures of time-series are known. Compared to baselines, our method is more effective in uncovering dependence structures in data even when there are delayed effects. Moreover, we evaluate our method to a real-world tsunami forecasting, where the exact dependence structures are unknown beforehand. Even in this challenging scenario, our method still effective learns physically-consistent dependence structures and achieves high accuracy in forecasting.
Retrieval-augmented generation (RAG) enhances large language models (LLMs) by accessing external data sources, offering a promising way to improve accuracy and reliability. Despite its potential, conventional retrievers encounter bias and flaws with time-sensitive queries. In this paper, a benchmark query dataset is constructed to retrieve documents containing time-evolving facts, and the results show that current embedding-based similarity-matching methods struggle to handle queries with explicit temporal constraints. Therefore, we propose a novel approach that integrates supervised contrastive learning with tailored negative sample pairs for temporal constraints to train the retriever of an RAG system, along with query-side fine-tuning and routing techniques. Experimental results show that our approach significantly enhances the retriever performance of time-sensitive queries while ensuring the effectiveness of general queries. We will make the code and dataset publicly available at https://github.com/suzhou-22/TS-Retriever.
To bridge the gaps between powerful Graph Neural Networks (GNNs) and lightweight Multi-Layer Perceptron (MLPs), GNN-to-MLP Knowledge Distillation (KD) proposes to distill knowledge from a well-trained teacher GNN into a student MLP. In this paper, we revisit the knowledge samples (nodes) in teacher GNNs from the perspective of hardness, and identify that hard sample distillation may be a major performance bottleneck of existing graph KD algorithms. The GNN-to-MLP KD involves two different types of hardness, one student-free knowledge hardness describing the inherent complexity of GNN knowledge, and the other student-dependent distillation hardness describing the difficulty of teacher-to-student distillation. However, most of the existing work focuses on only one of these aspects or regards them as one thing. This paper proposes a simple yet effective <u>H</u>ardness-aware <u>G</u>NN-to-<u>M</u>LP <u>D</u>istillation (HGMD) framework, which decouples the two hardnesses and estimates them using a non-parametric approach. Finally, two hardness-aware distillation schemes (i.e., HGMD-weight and HGMD-mixup) are further proposed to distill hardness-aware knowledge from teacher GNNs into the corresponding nodes of student MLPs. As non-parametric distillation, HGMD does not involve any additional learnable parameters beyond the student MLPs, but it still outperforms most of the state-of-the-art competitors. HGMD-mixup improves over the vanilla MLPs by 12.95% and outperforms its teacher GNNs by 2.48% averaged over seven real-world datasets. Codes will be made public at https://github.com/LirongWu/HGMD.
In the realm of information retrieval, users often engage in multi-turn interactions with search engines to acquire information, leading to the formation of sequences of user feedback behaviors. Leveraging the session context has proven to be beneficial for inferring user search intent and document ranking. A multitude of approaches have been proposed to exploit in-session context for improved document ranking. Despite these advances, the limitation of historical session data for capturing evolving user intent remains a challenge. In this work, we explore the integration of future contextual information into the session context to enhance document ranking. We present the siamese model optimization framework, comprising a history-conditioned model and a future-aware model. The former processes only the historical behavior sequence, while the latter integrates both historical and anticipated future behaviors. Both models are trained collaboratively using the supervised labels and pseudo labels predicted by the other. The history-conditioned model, referred to as ForeRanker, progressively learns future-relevant information to enhance ranking, while it singly uses historical session at inference time. To mitigate inconsistencies during training, we introduce the peer knowledge distillation method with a dynamic gating mechanism, allowing models to selectively incorporate contextual information. Experimental results on benchmark datasets demonstrate the effectiveness of our ForeRanker, showcasing its superior performance compared to existing methods.
Contrastive learning (CL) has emerged as a promising approach for representation learning in time series data by embedding similar pairs closely while distancing dissimilar ones. However, existing CL methods often introduce false negative pairs (FNPs) by neglecting inherent characteristics and then randomly selecting distinct segments as dissimilar pairs, leading to erroneous representation learning, reduced model performance, and overall inefficiency. To address these issues, we systematically define and categorize FNPs in time series into semantic false negative pairs and temporal false negative pairs for the first time: the former arising from overlooking similarities in label categories, which correlates with similarities in non-stationarity and the latter from neglecting temporal proximity. Moreover, we introduce StatioCL, a novel CL framework that captures non-stationarity and temporal dependency to mitigate both FNPs and rectify the inaccuracies in learned representations. By interpreting and differentiating non-stationary states, which reflect the correlation between trends or temporal dynamics with underlying data patterns, StatioCL effectively captures the semantic characteristics and eliminates semantic FNPs. Simultaneously, StatioCL establishes fine-grained similarity levels based on temporal dependencies to capture varying temporal proximity between segments and to mitigate temporal FNPs. Evaluated on real-world benchmark time series classification datasets, StatioCL demonstrates a substantial improvement over state-of-the-art CL methods, achieving a 2.9% increase in Recall and a 19.2% reduction in FNPs. Most importantly, StatioCL also shows enhanced data efficiency and robustness against label scarcity.
Conversational recommender systems (CRSs) aim to capture user preferences and provide personalized recommendations through multi-round natural language dialogues. However, most existing CRS models mainly focus on dialogue comprehension and preferences mining from the current dialogue session, overlooking user preferences in historical dialogue sessions. The preferences embedded in historical sessions and the current session exhibit continuity and sequentiality, and we refer to such CRSs as sequential CRSs. In this work, we leverage memory-enhanced LLMs to model the preference continuity, addressing two key issues: (1) redundancy and noise in historical dialogue sessions, and (2) the cold-start users problem. Thus, we propose a <u>Memo</u>ry-enhanced <u>C</u>onversational <u>R</u>ecommender <u>S</u>ystem Framework with Large Language Models (dubbed MemoCRS), consisting of user-specific memory and general memory. User-specific memory is tailored to each user's interests and uses an entity-based memory bank to refine preferences and retrieve relevant memory, thereby reducing the redundancy and noise of historical sessions. The general memory, encapsulating collaborative knowledge and reasoning guidelines, can provide shared knowledge for users, especially cold-start users. With the above memory, LLMs are empowered to deliver more precise and tailored recommendations for each user. Extensive experiments on Chinese and English datasets demonstrate MemoCRS's effectiveness.
Explaining black-box models is fundamental to gaining trust and deploying these models in real applications. As existing explanation methods have been shown to lack robustness against adversarial perturbations, there has been a growing interest in generating robust explanations. However, existing works resort to empirical defense strategies and these heuristic methods fail against powerful adversaries. In this paper, we certify the robustness of explanations motivated by the success of randomized smoothing. Specifically, we compute a tight radius in which the robustness of the explanation is certified. While a challenge is how to formulate the robustness of the explanation mathematically, we quantize the explanation into discrete spaces to mimic classification in randomized smoothing. To address the high computational cost of randomized smoothing, we introduce randomized gradient smoothing. Also, we explore the robustness of the semantic explanation by certifying the robustness of capsules. In the experiment, we demonstrate the effectiveness of our method on benchmark datasets from the perspectives of post-hoc explanation and semantic explanation respectively. Our work is a promising step towards filling the gap between the theoretical robustness bound and empirical explanations. Our code has been released at https://github.com/NKUShaw/CertifiedExplanation.
Federated learning on graphs (a.k.a., federated graph learning - FGL) has recently received increasing attention due to its capacity to enable collaborative learning over distributed graph datasets without compromising local clients' data privacy. In previous works, clients of FGL typically represent institutes or organizations that possess sets of entire graphs (e.g., molecule graphs in biochemical research) or parts of a larger graph (e.g., sub-user networks of e-commerce platforms). However, another natural paradigm exists where clients act as remote devices retaining the graph structures of local neighborhoods centered around the device owners (i.e., ego-networks), which can be modeled for specific graph applications such as user profiling on social ego-networks and infection prediction on contact ego-networks. FGL in such novel yet realistic ego-network settings faces the unique challenge of incomplete neighborhood information for non-ego local nodes since they likely appear and have different sets of neighbors in multiple ego-networks. To address this challenge, we propose an FGL method for distributed ego-networks in which clients obtain complete neighborhood information of local nodes through sharing node embeddings with other clients. A contrastive learning mechanism is proposed to bridge the gap between local and global node embeddings and stabilize the local training of graph neural network models, while a secure embedding sharing protocol is employed to protect individual node identity and embedding privacy against the server and other clients. Comprehensive experiments on various distributed ego-network datasets successfully demonstrate the effectiveness of our proposed embedding sharing method on top of different federated model sharing frameworks, and we also provide discussions on the potential efficiency and privacy drawbacks of the method as well as their future mitigation.
Sequential recommendation has attracted increasing attention due to its ability to accurately capture the dynamic changes in user interests. We have noticed that generative models, especially diffusion models, which have achieved significant results in fields like image and audio, hold considerable promise in the field of sequential recommendation. However, existing sequential recommendation methods based on diffusion models are constrained by a prior distribution limited to Gaussian distribution, hindering the possibility of introducing user-specific information for each recommendation and leading to information loss. To address these issues, we introduce the Schrödinger Bridge into diffusion-based sequential recommendation models, creating the SdifRec model. This allows us to replace the Gaussian prior of the diffusion model with the user's current state, directly modeling the process from a user's current state to the target recommendation. Additionally, to better utilize collaborative information in recommendations, we propose an extended version of SdifRec called con-SdifRec, which utilizes user clustering information as a guiding condition to further enhance the posterior distribution. Finally, extensive experiments on multiple public benchmark datasets have demonstrated the effectiveness of SdifRec and con-SdifRec through comparison with several state-of-the-art methods. Further in-depth analysis has validated their efficiency and robustness.
Image-text retrieval (ITR) has been one of the primary tasks in cross-modal retrieval, serving as a crucial bridge between computer vision and natural language processing. Significant progress has been made to achieve global alignment and local alignment between images and texts by mapping images and texts into a common space to establish correspondences between these two modalities. However, the rich semantic content contained in each image may bring false matches, resulting in the matched text ignoring the main semantics but focusing on the secondary or other semantics of this image. To address this issue, this paper proposes a semantically optimized approach with a novel Main Semantics Consistency (MSC) loss function, which aims to rank the semantically most similar images (or texts) corresponding to the given query at the top position during the retrieval process. First, in each batch of image-text pairs, we separately compute (i) the image-image similarity, i.e., the similarity between every two images, (ii) the text-text similarity, i.e., the similarity between a group of texts (that belong to a certain image) and another group of texts (that belong to another image), and (iii) the image-text similarity, i.e., the similarity between each image and each text. Afterward, our proposed MSC effectively aligns the above image-image, image-text, and text-text similarity, since the main semantics of every two images will be highly close if their text descriptions remain highly semantically consistent. By this means, we can capture the main semantics of each image to be matched with its corresponding texts, prioritizing the semantically most related retrieval results. Extensive experiments on MSCOCO and FLICKR30K verify the superior performance of MSC compared with the SOTA image-text retrieval methods. The source code of this project is released at GitHub: https://github.com/xyi007/MSC.
The Multi-Party Conversation (MPC) system has gained attention for its relevance in modern communication. Recent work has focused on developing specialized models for different MPC subtasks, improving state-of-the-art (SOTA) performance. However, since MPC demands often arise collaboratively, managing multiple specialized models is impractical. Additionally, dialogue evolves through diverse meta-information, where knowledge from specific subtasks can influence others. To address this, we propose UniMPC, a unified framework that consolidates common MPC subtasks. UniMPC uses a graph network with utterance nodes, a global node for combined local and global information, and two adaptable free nodes. It also incorporates discourse parsing to enhance model updates. We introduce MPCEval, a new benchmark for evaluating MPC systems. Experiments show UniMPC achieves over 95% of SOTA performance across all subtasks, with some surpassing existing SOTA, highlighting the effectiveness of the global node, free nodes, and dynamic discourse-aware graphs.
Community detection plays a pivotal role in network analysis, with applications in recommendation systems, anomaly detection, and biochemistry. However, traditional methods, while computationally efficient, often fall short in managing the complexities of real-world network structures. In contrast, deep learning approaches enhance accuracy but require substantial computational resources and task-specific architectures. This paper introduce GetCom, a novel three-phase "pre-train, generate, prompt" framework that integrates traditional methods and deep learning techniques. In the pre-training phase, GetCom acquires comprehensive understanding of community structures, which provides a solid foundation for the subsequent phases. During the generation phase, traditional community detection methods are employed to efficiently identify potential communities, which are subsequently refined in the prompt learning phase. This integration offers an efficient, accurate, and generalizable solution for community detection. Experiments on five real-world network datasets demonstrate that GetCom achieves state-of-the-art performance, with strong efficiency and generalization capabilities across diverse datasets and tasks.
Model editing aims to precisely alter the behaviors of large language models (LLMs) in relation to specific knowledge, while leaving unrelated knowledge intact. This approach has proven effective in addressing issues of hallucination and outdated information in LLMs. However, the potential of using model editing to modify knowledge in the medical field remains largely unexplored, even though resolving hallucination is a pressing need in this area. Our observations indicate that current methods face significant challenges in dealing with specialized and complex knowledge in medical domain. Therefore, we propose MedLaSA, a novel Layer-wise Scalable Adapter strategy for medical model editing. MedLaSA harnesses the strengths of both adding extra parameters and locate-then-edit methods for medical model editing. We utilize causal tracing to identify the association of knowledge in neurons across different layers, and generate a corresponding scale set from the association value for each piece of knowledge. Subsequently, we incorporate scalable adapters into the dense layers of LLMs. These adapters are assigned scaling values based on the corresponding specific knowledge, which allows for the adjustment of the adapter's weight and rank. The more similar the content, the more consistent the scale between them. This ensures precise editing of semantically identical knowledge while avoiding impact on unrelated knowledge. To evaluate the editing impact on the behaviours of LLMs, we propose two model editing studies for medical domain: (1) editing factual knowledge for medical specialization and (2) editing the explanatory ability for complex knowledge. We build two novel medical benchmarking datasets and introduce a series of challenging and comprehensive metrics. Extensive experiments on medical LLMs demonstrate the editing efficiency of MedLaSA, without affecting unrelated knowledge.
Understanding neurological disorder is a fundamental problem in neuroscience, which often requires the analysis of brain networks derived from functional magnetic resonance imaging (fMRI) data. Despite the prevalence of Graph Neural Networks (GNNs) and Graph Transformers in various domains, applying them to brain networks faces challenges. Specifically, the datasets are severely impacted by the noises caused by distribution shifts across sub-populations and the neglect of node identities, both obstruct the identification of disease-specific patterns. To tackle these challenges, we propose Contrasformer, a novel contrastive brain network Transformer. It generates a prior-knowledge-enhanced contrast graph to address the distribution shifts across sub-populations by a two-stream attention mechanism. A cross attention with identity embedding highlights the identity of nodes, and three auxiliary losses ensure group consistency. Evaluated on 4 functional brain network datasets over 4 different diseases, Contrasformer outperforms the state-of-the-art methods for brain networks by achieving up to 10.8% improvement in accuracy, which demonstrates its efficacy in neurological disorder identification. Case studies illustrate its interpretability, especially in the context of neuroscience. This paper provides a solution for analyzing brain networks, offering valuable insights into neurological disorders. Our code is available at https://github.com/AngusMonroe/Contrasformer.
Group activities are important behaviors in human society, providing personalized recommendations for groups is referred to as the group recommendation task. Existing methods can usually be categorized into two strategies to infer group preferences: 1) determining group preferences by aggregating members' personalized preferences, and 2) inferring group consensus by capturing group members' coherent decisions after common compromises. However, the former would suffer from the lack of group-level considerations, and the latter overlooks the fine-grained preferences of individual users. To this end, we propose a novel group recommendation method AlignGroup, which focuses on both group consensus and individual preferences of group members to infer the group decision-making. Specifically, AlignGroup explores group consensus through a well-designed hypergraph neural network that efficiently learns intra- and inter-group relationships. Moreover, AlignGroup innovatively utilizes a self-supervised alignment task to capture fine-grained group decision-making by aligning the group consensus with members' common preferences. Extensive experiments on two real-world datasets validate that our AlignGroup outperforms the state-of-the-art on both the group recommendation task and the user recommendation task, as well as outperforms the efficiency of most baselines.
Spectral Graph Neural Networks (GNNs) are gaining attention for their ability to surpass the limitations of message-passing GNNs. They rely on supervision from downstream tasks to learn spectral filters that capture useful graph frequency information. However, some works empirically show that the preferred graph frequency is related to the graph homophily level. The relationship between graph frequency and graph homophily level has not been systematically analyzed and explored in existing spectral GNNs. To mitigate this gap, we conduct theoretical and empirical analyses revealing a positive correlation between low-frequency importance and the homophily ratio, and a negative correlation between high-frequency importance and the homophily ratio. Motivated by this, we propose shape-aware regularization on a Newton Interpolation-based spectral filter that can (i) learn an arbitrary polynomial spectral filter; and (ii) incorporate prior knowledge about the desired shape of the corresponding homophily level. Comprehensive experiments demonstrate that NewtonNet can achieve graph spectral filters with desired shapes and superior performance on both homophilous and heterophilous datasets. Our code is available at https://github.com/junjie-xu/NewtonNet.
With the widespread development of database systems, data security has become crucial when it comes to sharing among users and servers. A straightforward approach involves using searchable encryption to ensure the confidentiality of shared data. However, in certain scenarios, varying user tiers are granted disparate data searching privileges, and administrators need to restrict the searchability of ciphertexts to select users exclusively. To address this issue, public key encryption with authorized keyword search (PEAKS) was proposed, wherein solely authorized users possess the ability to conduct targeted keyword searches. Nonetheless, it is vulnerable to resist quantum computing attacks. As a result, research focusing on authorizing users to search for keywords while achieving quantum security is far-reaching. In this paper, we propose a lattice-based variant of PEAKS (L-PEAKS) that enables keyword dataset authorization for outsourced data management. Unlike existing schemes, our design incorporates identity-based encryption (IBE) to overcome the bottleneck of public key management. Besides, we utilize several lattice sampling algorithms to defend against attacks from quantum adversaries. Specifically, each authorized user must obtain a search privilege from an authority. The authority distributes an authorized token to the user within a specific time period, and the user generates a trapdoor for any authorized keywords. Our scheme is proven to be secure against IND-sID-CKA and T-EUF security in a quantum setting. We also conduct comprehensive evaluations on a commodity machine to assess completeness and provide theoretical complexity comparisons with existing state-of-the-art schemes.
Identifying disinformation from online social media is crucial for maintaining a credible cyberspace. Although features from the content and propagation topology are widely exploited by existing studies to distinguish disinformation from normal ones, they are becoming less effective as content can be intentionally written to mislead readers and topological features are difficult to be extracted due to the high variance and diversity of reposting trees. Moreover, related works mainly focus on modeling the complete information propagation event, ignoring the staged evolution patterns along with propagation, which may also degrade the detection performance. In this paper, we conceive and implement a novel framework called DMPS for identifying disinformation, which Dynamically Models diverse topological structures of reposting trees as well as the textual content streams across different Propagation Stages. In particular, DMPS learns expressive representations of the structural features via meta-trees and extracts sequential features of the content for intra-stage modeling, then it captures temporal dependencies for inter-stage modeling. The whole framework is optimized in a binary classification manner. Experiments based on multilingual social media datasets validate the effectiveness and superiority of DMPS over state-of-the-art models. We believe that this study can provide insights for crisis management in response to disinformation in social network campaigns.
Single-cell sequencing technologies have revolutionized genomics by enabling the simultaneous profiling of various molecular modalities within individual cells. Their integration, especially cross-modality translation, offers deep insights into cellular regulatory mechanisms. Many methods have been developed for cross-modality translation, but their reliance on scarce high-quality co-assay data limits their applicability. Addressing this, we introduce scACT, a deep generative model designed to extract cross-modality biological insights from unpaired single-cell data. scACT tackles three major challenges: aligning unpaired multi-modal data via adversarial training, facilitating cross-modality translation without prior knowledge via cycle-consistent training, and enabling interpretable regulatory interconnections explorations via in-silico perturbations. To test its performance, we applied scACT on diverse single-cell datasets and found it outperformed existing methods in all three tasks. Finally, we have developed scACT as an individual open-source software package to advance single-cell omics data processing and analysis within the research community.
Pre-trained language models (PLMs) have established the new paradigm in the field of NLP. For more powerful PLMs, one of the most popular and successful ways is to continuously scale up sizes of the models and the pre-training corpora. These large corpora, typically obtained by converging smaller ones from multiple sources, are thus growing increasingly diverse. However, colossal converged corpora don't always enhance PLMs' performance. In this paper, we identify the disadvantage of heterogeneous corpora from multiple sources for pre-training PLMs. Towards coordinated pre-training on diverse corpora, we further propose Source Prompt (SP), which explicitly prompt the model with the source of data at the pre-training and fine-tuning stages. Extensive experimental results show that pre-training PLMs with SP on diverse corpora significantly improves performance in various downstream tasks.
The automatic generation of radiological imaging reports aims to produce accurate and coherent clinical descriptions based on X-ray images. This facilitates clinicians in completing the arduous task of report writing and advances clinical automation. The primary challenge in radiological imaging report generation lies in accurately capturing and describing abnormal regions in the images under data bias conditions, resulting in the generation of lengthy texts containing image details. Existing methods mostly rely on prior knowledge such as medical knowledge graphs, corpora, and image databases to assist models in generating more precise textual descriptions. However, these methods still struggle to identify rare anomalies in the images. To address this issue, we propose a two-stage training model, named CLR2G, based on cross-modal contrastive learning. This model delegates the task of capturing anomalies, particularly those challenging for the generative model trained with cross-entropy loss under data bias conditions, to a specialized abnormality capture component. Specifically, we employ a semantic matching loss function to train additional abnormal image and text encoders through cross-modal contrastive learning, facilitating the capture of 13 common anomalies. We utilize the anomalous image features, text features and their confidence probabilities as a posteriori knowledge to help the model generate accurate image reports. Experimental results demonstrate the state-of-the-art performance of our method on two widely used public datasets, IU-Xray and MIMIC-CXR.
Learning to answer multi-step complex questions requires machines to perform like a human to think and reason step by step, which is one of the core abilities of a question answering system. Recent advancements have revealed that large language models exhibit remarkable reasoning capabilities by generating intermediate chain-of-thought rationales. However, the completeness of their rationales lacks assurance as they are susceptible to omitting steps and making factual errors. In this paper, drawing inspiration from human-like reasoning processes in answering multi-step questions, we explicitly plan the rationales to ensure their completeness. We propose a two-stage Decomposition-Evaluation (Dec-Eval) framework including a step decomposition stage and a rationale generation stage. Specifically, in the first stage, we decompose the complex question into simpler sub-ones and simulate a human's ability to grasp logical clues to ensure the integrity of step planning. Then, in the second stage, based on the sub-questions, we generate and evaluate rationales step by step. Both stages work together organically, improving the completeness of rationales and the accuracy of the answer. To further control the question answering process, we propose a novel knowledge injection mechanism that incorporates external knowledge to guide both stages. Extensive experiments on three challenging multi-step QA datasets demonstrate that Dec-Eval can explicitly generate more logical rationales, and significantly improve the reasoning performances of different backbone models.
The scientific impact of academic papers is influenced by intricate factors such as dynamic popularity and inherent contribution. Existing models typically rely on static graphs for citation count estimation, failing to differentiate among its sources. In contrast, we propose distinguishing effects derived from various factors and predicting citation increments as estimated potential impacts within the dynamic context. In this research, we introduce a novel model, DPPDCC, which Disentangles the Potential impacts of Papers into Diffusion, Conformity, and Contribution values. It encodes temporal and structural features within dynamic heterogeneous graphs derived from the citation networks and applies various auxiliary tasks for disentanglement. By emphasizing comparative and co-cited/citing information and aggregating snapshots evolutionarily, DPPDCC captures knowledge flow within the citation network. Afterwards, popularity is outlined by contrasting augmented graphs to extract the essence of citation diffusion and predicting citation accumulation bins for quantitative conformity modeling. Orthogonal constraints ensure distinct modeling of each perspective, preserving the contribution value. To gauge generalization across publication times and replicate the realistic dynamic context, we partition data based on specific time points and retain all samples without strict filtering. Extensive experiments on three datasets validate DPPDCC's superiority over baselines for papers published previously, freshly, and immediately, with further analyses confirming its robustness. Our codes and supplementary materials can be found at GitHub (https://github.com/ECNU-Text-Computing/DPPDCC).
Federated learning (FL) enables collaborative learning across multiple biomedical data silos with multimodal foundation models while preserving privacy. Due to the heterogeneity in data processing and collection methodologies across diverse medical institutions and the varying medical inspections patients undergo, modal heterogeneity exists in practical scenarios, where severe modal heterogeneity may even prevent model training. With privacy considerations, data transfer cannot be permitted, restricting knowledge exchange among different clients. To trickle these issues, we propose a cross-modal prototype imputation method for visual-language understanding (Buffalo) with only a slight increase in communication cost, which can improve the performance of fine-tuning general foundation models for downstream biomedical tasks. We conducted extensive experiments on medical report generation and biomedical visual question-answering tasks. The results demonstrate that Buffalo can fully utilize data from all clients to improve model generalization compared to other modal imputation methods in three modal heterogeneity scenarios, approaching or even surpassing the performance in the ideal scenario without missing modality.
Multi-task learning (MTL) has become increasingly prevalent in e-commerce recommender systems. However, existing MTL methods, particularly those utilizing the Multi-gate Mixture-of-Experts (MMoE) architecture, face challenges due to their implicit routing mechanisms. These mechanisms can inadvertently lead to negative knowledge transfer, failing to resolve conflicts among tasks and resulting in gradient contradictions on shared parameters. Such issues undermine the generalization capability of MTL models across various tasks. To address these limitations, we introduce the Task Information Decoupling Model (TIDM), designed to alleviate negative transfer by decoupling task knowledge. TIDM incorporates two innovative modules following the expert layer: the Maximize Information Aggregation Module (MIA) and the Automatic Information Selection Module (AIS). The MIA module employs an auxiliary loss to filter out irrelevant task information and aggregates task-specific knowledge using a dissimilar self-attention network. Subsequently, the AIS module automatically selects the most pertinent task-specific information to facilitate task tower learning. Our experiments demonstrate that TIDM outperforms five contemporary MTL models across two datasets, showcasing its effectiveness in extracting task-specific information. This advancement is crucial for enhancing the performance of recommender systems in e-commerce and other complex domains.
Network embedding is a commonly used technique in graph mining and plays an important role in a variety of applications. Most network embedding works can be categorized into positional node embedding methods and target at capturing the proximity/relative position of node pairs. Recently, structural node embedding has attracted tremendous research interest, which is intended to perceive the local structural information of node, i.e., nodes can share similar local structures in different positions of graphs. Although numerous structural node embedding methods are designed to encode such structural information, most, if not all, of these methods cannot simultaneously achieve the following three desired properties: (1) bijective mapping between embedding and local structure of node; (2) inductive capability; and (3) good interpretability of node embedding. To address this challenge, in this paper, we propose a novel structural node embedding algorithm named topological anonymous walk embedding (TAWE). Specifically, TAWE creatively integrates anonymous walk and breadth-first search (BFS) to construct the bijective mapping between node embedding and local structure of node. In addition, TAWE possesses inductive capability and good interpretability of node embedding. Experimental results on both synthetic and real-world datasets demonstrate the effectiveness of the proposed TAWE algorithm in both structural node classification task and structural node clustering task.
Accurately predicting Vehicle Energy Consumption (VEC) is crucial for estimating a vehicle's total energy requirements along a predetermined trajectory. Current research mainly focuses on personalized models that enhance VEC prediction accuracy by leveraging driving behavior features extracted from historical trajectory data. However, there are still two significant limitations. First, existing algorithms predominantly model trajectories with coarse granularity, focusing solely on overall characteristics and neglecting the crucial interplay between vehicles, drivers, and the environments, which fundamentally shape trajectory dynamics. Second, current models predict driver behavior preferences solely from vehicle operational states in historical trajectories, often overlooking the influence of external environmental factors. To overcome these limitations, we introduce a Spatial-Temporal Framework for Energy Consumption Prediction of Vehicle Trajectories (ST-ECP). Specifically, we construct a heterogeneous interaction graph that captures the complex relationships between vehicles, environments, and drivers, effectively characterizing the dynamic attributes of trajectories across various conditions. Additionally, we design a personalized pattern aggregation module to extract personalized driving behavior features. Extensive experimental on real-world datasets demonstrate the effectiveness and efficiency of ST-ECP.
Graphlets are small, connected, and non-isomorphic induced subgraphs that describe the topological structure of a graph. Counting graphlets is a fundamental task in graph mining and social network analysis. It has numerous applications in many fields, including dense subgraph discovery, anomaly detection, etc. Most existing work assumes a static graph. However, graphs are dynamic in the real world, which can be described as graph streams. Counting graphlets in graph streams is a challenge due to the streaming nature of the input. While there have been several studies on counting graphlets in graph streams, these works are limited to simple graphlets like triangles and butterflies. In this paper, we propose SGES algorithm to estimate more complex graphlets in graph streams. In SGES, we first propose an unbiased sampling strategy to maintain fixed-size sampled edges, which in turn allows us to unbiasedly estimate the number of subgraphs and then count graphlets based on the combinational relationship between the number of subgraphs and the number of graphlets. Extensive experiments over large real-world graph streams prove that our algorithm can obtain accurate estimation values of graphlet counts with high throughput.
Illegal parking prediction is a crucial problem to help stakeholders with better urban planning and management. Existing works advance the field by capturing complex traffic correlations from spatial and temporal perspectives using deep learning models, and achieve state-of-the-art performance. However, current works do not consider the unique perspective from the illegal parking data collection process carried out by patrol officers, which can reflect a wealth of knowledge gained from each officer's on-the-ground experiences for more effective patrol. In this paper, we propose a novel behavior-aware hypergraph convolutional network named BHIPP for city-wide illegal parking prediction. To better represent the correlations of illegal parking events from patrol officers' perspective, we construct a new patrol hypergraph integrating patrol officers' experience alongsie multi-source contextual information. Additionally, we design a behavior-aware hypergraph convolutional network, which captures the complex and high-order illegal parking event correlations with officers' patrol behaviors explicitly considered. Further, we introduce a spatial-temporal illegal parking approximation module to estimate parking violations in under-patrolled regions using both historical and multi-source contextual data. Extensive experiments on real-world datasets demonstrate the superiority of our proposed BHIPP compared with a broad range of state-of-the-art baseline models across varying spatial-temporal granularities, from both regression and ranking aspects.
Graph Contrastive Learning (GCL) has demonstrated remarkable effectiveness in learning representations on graphs in recent years. To generate ideal augmentation views, the augmentation generation methods should preserve essential information while discarding less relevant details for downstream tasks. However, current augmentation methods usually involve random topology corruption in the spatial domain, which fails to adequately address information spread across different frequencies in the spectral domain. Our preliminary study highlights this issue, demonstrating that spatial random perturbations impact all frequency bands almost uniformly. Given that task-relevant information typically resides in specific spectral regions that vary across graphs, this one-size-fits-all approach can pose challenges. We argue that indiscriminate spatial random perturbation might unintentionally weaken task-relevant information, reducing its effectiveness.
To tackle this challenge, we propose applying perturbations selectively, focusing on information specific to different frequencies across diverse graphs. In this paper, we present GASSER, a model that applies tailored perturbations to specific frequencies of graph structures in the spectral domain, guided by spectral hints. Through extensive experimentation and theoretical analysis, we demonstrate that the augmentation views generated by GASSER are adaptive, controllable, and intuitively aligned with the homophily ratios and spectrum of graph structures.
Given a query graph, top-k subgraph matching finds up to k matches in a data graph with the highest scores according to a user-defined scoring function. It has wide applications across many fields, including knowledge graphs and social networks. Due to the enormous search space, existing methods are not efficient enough on large graphs. In this paper, we propose PTAB, an efficient algorithm for top-k subgraph matching. It traverses an efficiently pruned search space by topology-aware sub-space score upper bounds computed from a novel hop index, which stores the range of node properties in a constrained multi-hop neighborhood of each node. Additionally, PTAB integrates a cost-aware root selection strategy, which chooses query nodes leading to a search process that utilizes the pruning power of the hop index as much as possible. Furthermore, we use a novel edge-cut strategy to handle general query graphs with cycles. Experimental results on real and synthetic datasets demonstrate that our method outperforms existing methods.
Recommender systems are crucial tools in online applications, assisting users in discovering relevant content efficiently. Recent studies demonstrate that contrastive learning (CL) based methods yield significant results in collaborative filtering (CF) recommendations, due to their ability to address the issue of data sparsity. However, two inherent limitations remain unexplored in these methods. a) Since the datasets commonly used are binary (0: no interaction; 1: interaction), current methods only provide rudimentary modeling of user behaviors in binary form, which fails to model complex user-item interactions and relationships in real-world recommendation scenarios. b) Existing CL-based methods mostly construct contrastive views through heuristic-based embedding or structure perturbation, which are prone to introduce noise or discard important information, leading to a decreased representation quality. To address these issues, we propose a Decoupled Behavior-based Contrastive Recommendation model (DBCR) that effectively decouples user behaviors from binary datasets for better user-item interaction modeling. The core idea is to decouple latent user behaviors from unlabelled user-item interactions (binary datasets) and utilize self-supervised contrastive learning to optimize CF-based recommendation jointly. Specifically, we introduce latent behavior variables and embed them into user-item interaction modeling within the generalized expectation-maximization (EM) framework. Moreover, we design a contrastive learning task by constructing a preference view instead of unreasonable perturbation to further improve the learned representation. Experimental results and analyses on three real-world datasets demonstrate the effectiveness of DBCR and its high efficiency, with an average improvement of 16.9% over state-of-the-art methods. Our code is available on https://github.com/Du-danger/DBCR.
With the explosion of multimedia content, video moment retrieval (VMR), which aims to detect a video moment that matches a given text query from a video, has been studied intensively as a critical problem. However, the existing VMR framework evaluates video moment retrieval performance, assuming that a video is given, which may not reveal whether the models exhibit overconfidence in the falsely given video. In this paper, we propose the MVMR (Massive Videos Moment Retrieval for Faithfulness Evaluation) task that aims to retrieve video moments within a massive video set, including multiple distractors, to evaluate the faithfulness of VMR models. For this task, we suggest an automated massive video pool construction framework to categorize negative (distractors) and positive (false-negative) video sets using textual and visual semantic distance verification methods. We extend existing VMR datasets using these methods and newly construct three practical MVMR datasets. To solve the task, we further propose a strong informative sample-weighted learning method, CroCs, which employs two contrastive learning mechanisms: (1) weakly-supervised potential negative learning and (2) cross-directional hard-negative learning. Experimental results on the MVMR datasets reveal that existing VMR models are easily distracted by the misinformation (distractors), whereas our model shows significantly robust performance, demonstrating that CroCs is essential to distinguishing positive moments against distractors.
Urban traffic is subject to disruptions that cause extended waiting time and safety issues at signalized intersections. While numerous studies have addressed the issue of intelligent traffic systems in the context of various disturbances, traffic signal malfunction, a common real-world occurrence with significant repercussions, has received comparatively limited attention. The primary objective of this research is to mitigate the adverse effects of traffic signal malfunction, such as traffic congestion and collision, by optimizing the control of neighboring functioning signals. To achieve this goal, this paper presents a novel traffic signal control framework (MalLight), which leverages an Influence-aware State Aggregation Module (ISAM) and an Influence-aware Reward Aggregation Module (IRAM) to achieve coordinated control of surrounding traffic signals. To the best of our knowledge, this study pioneers the application of a Reinforcement Learning(RL)-based approach to address the challenges posed by traffic signal malfunction. Empirical investigations conducted on real-world datasets substantiate the superior performance of our proposed methodology over conventional and deep learning-based alternatives in the presence of signal malfunction, with reduction of throughput alleviated by as much as 48.6%.
Numerous explanation methods have been recently developed to interpret the decisions made by deep neural network (DNN) models. For image classifiers, these methods typically provide an attribution score to each pixel in the image to quantify its contribution to the prediction. However, most of these explanation methods appropriate attribution scores to pixels independently, even though both humans and DNNs make decisions by analyzing a set of closely related pixels simultaneously. Hence, the attribution score of a pixel should be evaluated jointly by considering itself and its structurally-similar pixels. We propose a method called IProp, which models each pixel's individual attribution score as a source of explanatory information and explains the image prediction through the dynamic propagation of information across all pixels. To formulate the information propagation, IProp adopts the Markov Reward Process, which guarantees convergence, and the final status indicates the desired pixels' attribution scores. Furthermore, IProp is compatible with any existing attribution-based explanation method. Extensive experiments on various explanation methods and DNN models verify that IProp significantly improves them on a variety of interpretability metrics.
The inclusion of the images opens up a security vulnerability of visually-aware recommender systems (VARSs). It can be exploited by unscrupulous parties to upload well-crafted adversarial images for certain malicious purposes (e.g., promoting their own products for profits). Some studies have focused on attacking VARSs to gain insights into their robustness, while they are still far from practical, i.e., the attacks often 1) lack diversity in perturbations, 2) are easily perceived and 3) have limited transferability, which may lead to overestimation of defenses in practice. To tackle the problems, we propose to perturb the style of the product, which is an unnoticeable but important property of visual recommendations. Specifically, we propose a novel Style perturbation-based Practical Attack Framework (SPAF). Unlike existing attacks that change pixels within l∞ -norm constraints, SPAF interferes with styles in latent feature space so that the attack becomes unbounded in the pixel space to reflect possible actual perturbations. SPAF formulates attack objectives as an optimization problem and adopts an adaptive adversarial style transfer network to solve it so that transferable and imperceptible attacks can be generated. Comprehensive experiments on real-world datasets demonstrate that SPAF significantly outperforms state-of-the-art attacks.
Remote control malwares enable cyber attackers to achieve command and control over victim hosts, which are widely employed in ransomware attacks and espionage operations, jeopardizing personal privacy and state security. To effectively detect such malicious traffics holds high practical value. However, prior works have not adequately addressed the task due to challenges of encrypted traffics with misleading contents, incomplete sessions, and limited labels. To overcome these limitations, in this paper, we propose TrafCL, a contrastive learning framework for robust encrypted malicious traffic detection. In TrafCL, we first generate incomplete variants for the input session by Session Augmentation, then extract explicit session features with excluding misleading traffic contents by Triple-aspect Session Feature Extraction, and obtain session representations by Co-attention Session Encoder which fuses triple-aspect session features with capturing their interdependence. After that, we use a projection head to obtain final representations. TrafCL is pre-trained using unlabeled data to learn close representations for complete sessions and their incomplete variants, then fine-tuned on labeled data to detect encrypted malicious traffics. Experiment results show that TrafCL outperforms the best baseline by 11.35% and 6.71% in F1-scores on two datasets respectively.
Cross-Domain Recommendation (CDR) seeks to utilize knowledge from different domains to alleviate the problem of data sparsity in the target recommendation domain, and has been gaining more attention in recent years. Although there have been notable advances in this area, most current methods represent users and items in Euclidean space, which is not ideal for handling long-tail distributed data in recommendation systems. Additionally, adding data from other domains can worsen the long-tail characteristics of the entire dataset, making it harder to train CDR models effectively. Recent studies have shown that hyperbolic methods are particularly suitable for modeling long-tail distributions, which has led us to explore hyperbolic representations for users and items in CDR scenarios. However, due to the distinct characteristics of the different domains, applying hyperbolic representation learning to CDR tasks is quite challenging. In this paper, we introduce a new framework called Hyperbolic Contrastive Learning (HCTS), designed to capture the unique features of each domain while enabling efficient knowledge transfer between domains. We achieve this by embedding users and items from each domain separately and mapping them onto distinct hyperbolic manifolds with adjustable curvatures for prediction. To improve the representations of users and items in the target domain, we develop a hyperbolic contrastive learning module for knowledge transfer. Extensive experiments on real-world datasets demonstrate that hyperbolic manifolds are a promising alternative to Euclidean space for CDR tasks. The codes are available at https://github.com/EnkiXin/hcts.
Federated Learning (FL) is a novel client-server distributed learning framework that can protect data privacy. However, recent works show that FL is vulnerable to poisoning attacks. Many defenses with robust aggregators (AGRs) are proposed to mitigate the issue, but they are all broken by advanced attacks. Very recently, some renewed robust AGRs are designed, typically with novel clipping or/and filtering strategies, and they show promising defense performance against the advanced poisoning attacks. In this paper, we show that these novel robust AGRs are also vulnerable to carefully designed poisoning attacks. Specifically, we observe that breaking these robust AGRs reduces to bypassing the clipping or/and filtering of malicious clients, and propose an optimization-based attack framework to leverage this observation. Under the framework, we then design the customized attack against each robust AGR. Extensive experiments on multiple datasets and threat models verify our proposed optimizationbased attack can break the SOTA AGRs. We hence call for novel defenses against poisoning attacks to FL. Code is available at: https: //github.com/Yuxin104/BreakSTOAPoisoningDefenses.
Uniform Interpolation (UI) is an advanced non-standard reasoning service that seeks to refine ontologies by creating rewritten modules. These modules, known as uniform interpolants, retain only "relevant names" while preserving their meanings in the absence of other names. UI holds significant potential across various domains where tailored ontology modules are required. However, realizing its full potential demands highly optimized techniques for generating such modules. Previous studies have identified notable challenges in generating uniform interpolants for EL-ontologies, where their computation is substantially more complex and computationally demanding than standard subset modules.
Despite these obstacles, this paper introduces an advanced "forgetting" method tailored for computing uniform interpolants of ELIO-ontologies with ABoxes. We show that with effective normalization and inference strategies, these uniform interpolants can be computed efficiently, matching the speed of standard module computation. A comprehensive evaluation using a prototype implementation of this method achieved a 100% success rate on two major benchmark datasets, Oxford-ISG and BioPortal, with results delivered within seconds. The efficiency of our approach is attributed to our novel linear strategy for introducing definers, in sharp contrast to existing strategies that lead to an exponential increase in definers and computational inefficiency. Our method is unique in its ability to create signature-restricted modules for large-scale ontologies, making it a vital addition to the community's toolkit.
Social event detection refers to extracting relevant message clusters from social media data streams to represent specific events in the real world. Social event detection is important in numerous areas, such as opinion analysis, social safety, and decision-making. Most current methods are supervised and require access to large amounts of data. These methods need prior knowledge of the events and carry a high risk of leaking sensitive information in the messages, making them less applicable in open-world settings. Therefore, conducting unsupervised detection while fully utilizing the rich information in the messages and protecting data privacy remains a significant challenge. To this end, we propose a novel social event detection framework, ADP-SEMEvent, an unsupervised social event detection method that prioritizes privacy. Specifically, ADP-SEMEvent is divided into two stages, i.e., the construction stage of the private message graph and the clustering stage of the private message graph. In the first stage, an adaptive differential privacy approach is used to construct a private message graph. In this process, our method can adaptively apply differential privacy based on the events occurring each day in an open environment to maximize the use of the privacy budget. In the second stage, to address the reduction in data utility caused by noise, a novel 2-dimensional structural entropy minimization algorithm based on optimal subgraphs is used to detect events in the message graph. The highlight of this process is unsupervised and does not compromise differential privacy. Extensive experiments on two public datasets demonstrate that ADP-SEMEvent can achieve detection performance comparable to state-of-the-art methods while maintaining reasonable privacy budget parameters.
Data from observational studies (OSs) is widely available and readily obtainable yet frequently contains confounding biases. On the other hand, data derived from randomized controlled trials (RCTs) helps to reduce these biases; however, it is expensive to gather, resulting in a tiny size of randomized data. For this reason, effectively fusing observational data and randomized data to better estimate heterogeneous treatment effects (HTEs) has gained increasing attention. However, existing methods for integrating observational data with randomized data must require complete observational data, meaning that both treated subjects and untreated subjects must be included in OSs. This prerequisite confines the applicability of such methods to very specific situations, given that including all subjects, whether treated or untreated, in observational studies is not consistently achievable. In our paper, we propose a resilient approach to Combine Incomplete Observational data and randomized data for HTE estimation, which we abbreviate as CIO. The CIO is capable of estimating HTEs efficiently regardless of the completeness of the observational data, be it full or partial. Concretely, a confounding bias function is first derived using the pseudo-experimental group from OSs, in conjunction with the pseudo-control group from RCTs, via an effect estimation procedure. This function is subsequently utilized as a corrective residual to rectify the observed outcomes of observational data during the HTE estimation by combining the available observational data and the all randomized data. To validate our approach, we have conducted experiments on a synthetic dataset and two semi-synthetic datasets.
Event Temporal Relation Extraction (ETRE) is paramount but challenging. Within a discourse, event pairs are situated at different distances or the so-called proximity bands. The temporal ordering communicated about event pairs where at more remote (i.e., "long'') or less remote (i.e., "short'') proximity bands are encoded differently. SOTA models have tended to perform well on events situated at either short or long proximity bands, but not both. Nonetheless, real-world, natural texts contain all types of temporal event-pairs. In this paper, we present MulCo : Distilling Mul ti-Scale Knowledge via Co ntrastive Learning, a knowledge co-distillation approach that shares knowledge across multiple event pair proximity bands to improve performance on all types of temporal datasets. Our experimental results show that MulCo successfully integrates linguistic cues pertaining to temporal reasoning across both short and long proximity bands and achieves new state-of-the-art results on several ETRE benchmark datasets.
A series of studies apply machine learning to assist cost-based query optimizers in DBMS, emphasizing incorporating uncertainty predictions to guide decision-making. While these approaches have demonstrated advancement in some benchmarks, their drawbacks, such as unstable performance, stem from the inherent challenges of using machine learning models to predict the cost of execution plans and the lack of exploration of the intrinsic characteristics of suboptimal plans. In this paper, we introduce an alert system for query optimization, which is built upon cost models to reduce the selection of regressed plans. The key insight is that there are differences in the predictive uncertainty that lead to query optimization and the regression of execution plans. We investigate the causes of these differences in uncertainty and design a discriminator to filter out execution plans with higher risks of regression. The alert system can be integrated with various cost models, enhancing the robustness of query optimizers. In our experiments, the system further reduces execution time by 20% compared to learned optimizers. Meanwhile, the proportion of optimized queries reduced by the alert system is just 15% of the proportion of regressed queries diminished.
Out of sustainable and economical considerations, two-sided recommendation platforms must satisfy the needs of both users and providers. Previous studies often show that the two sides' needs show different urgency: providers need a relatively long-term exposure demand while users want more short-term and accurate service. However, our empirical study reveals that previous methods for trading off fairness-accuracy often fail to guarantee long-term fairness and short-term accuracy simultaneously in real applications of fluctuating user traffic. Especially, when user traffic is low, the user experience often drops a lot. Our theoretical analysis also confirms that user traffic is a key factor in such a trade-off problem. How to guarantee accuracy and fairness under fluctuating user traffic remains a problem. Inspired by the bankruptcy problem in economics, we propose a novel fairness-aware re-ranking approach named BankFair. Intuitively, BankFair employs the Talmud rule to leverage periods of abundant user traffic to offset periods of user traffic scarcity, ensuring consistent user service at every period while upholding long-term fairness. Specifically, BankFair consists of two modules: (1) employing the Talmud rule to determine the required fairness degree under varying periods of user traffic; and (2) conducting an online re-ranking algorithm based on the fairness degree determined by the Talmud rule. Experiments on two real-world recommendation datasets show that BankFair outperforms all baselines regarding accuracy and provider fairness.
In this paper, we address the problem of unsupervised video anomaly detection (UVAD). The task aims to detect abnormal events in test video using unlabeled videos as training data. The presence of anomalies in the training data poses a significant challenge in this task, particularly because they form clusters in the feature space. We refer to this property as the "Anomaly Cluster" issue. The condensed nature of these anomalies makes it difficult to distinguish between normal and abnormal data in the training set. Consequently, training conventional anomaly detection techniques using an unlabeled dataset often leads to sub-optimal results. To tackle this difficulty, we propose a new method called Cleansed k-Nearest Neighbor (CKNN), which explicitly filters out the Anomaly Clusters by cleansing the training dataset. Following the k-nearest neighbor algorithm in the feature space provides powerful anomaly detection capability. Although the identified Anomaly Cluster issue presents a significant challenge to applying k-nearest neighbor in UVAD, our proposed cleansing scheme effectively addresses this problem. We evaluate the proposed method on various benchmark datasets and demonstrate that CKNN outperforms the previous state-of-the-art UVAD method by up to 8.5% (from 82.0 to 89.0) in terms of AUROC. Moreover, we emphasize that the performance of the proposed method is comparable to that of the state-of-the-art method trained using anomaly-free data.
Graph neural networks (GNN) are vulnerable to adversarial attacks, which aim to degrade the performance of GNNs through imperceptible changes on the graph. However, we find that in fact the prevalent meta-gradient-based attacks, which utilizes the gradient of the loss w.r.t the adjacency matrix, are biased towards training nodes. That is, their meta-gradient is determined by a training procedure of the surrogate model, which is solely trained on the training nodes. This bias manifests as an uneven perturbation, con- necting two nodes when at least one of them is a labeled node, i.e., training node, while it is unlikely to connect two unlabeled nodes. However, these biased attack approaches are sub-optimal as they do not consider flipping edges between two unlabeled nodes at all. This means that they miss the potential attacked edges between unlabeled nodes that significantly alter the representation of a node. In this paper, we investigate the meta-gradients to uncover the root cause of the uneven perturbations of existing attacks. Based on our analysis, we propose a Meta-gradient-based attack method using contrastive surrogate objective (Metacon), which alleviates the bias in meta-gradient using a new surrogate loss. We conduct extensive experiments to show that Metacon outperforms existing meta gradient-based attack methods through benchmark datasets, while showing that alleviating the bias towards training nodes is effective in attacking the graph structure.
Graph neural networks (GNNs) have recently demonstrated significant success. Active learning for GNNs aims to query the valuable samples from the unlabeled data for annotation to maximize the GNNs' performance at a low cost. However, most existing methods for reinforced active learning in GNNs may lead to a highly imbalanced class distribution, especially in highly skewed class scenarios. This further adversely affects the classification performance. To tackle this issue, in this paper, we propose a novel reinforced class-balanced active learning framework for GNNs, namely, GraphCBAL. It learns an optimal policy to acquire class-balanced and informative nodes for annotation, maximizing the performance of GNNs trained with selected labeled nodes. GraphCBAL designs class-balance-aware states, as well as a reward function that achieves trade-off between model performance and class balance. We further upgrade GraphCBAL to GraphCBAL++ by introducing a punishment mechanism to obtain a more class-balanced labeled set. Extensive experiments on multiple datasets demonstrate the effectiveness of the proposed approaches, achieving superior performance over state-of-the-art baselines. In particular, our methods can strike the balance between classification results and class balance. We provide our code and data at https://github.com/cici-chengcheng/GraphCBAL.
The attention mechanism has the advantage of handling long-term correlations, and has been widely adopted in multivariate time series (MTS) prediction. As an important application of MTS, traffic flow prediction has the most popular solution using transformer-based prediction models nowadays. Just with attention mechanism, those models can learn the spatio-temporal correlations from traffic data. However, the up-to-date linear prediction models have questioned the effectiveness of current transformer-based models in certain conditions, which provides new possibilities for more efficient work. We rethink the role of the attention mechanism during spatio-temporal modeling from a decoupling perspective, and propose DEC-Former for traffic flow prediction. Specifically, the trend and seasonal parts of the time series data, the geographical adjacency of the nodes in the road network, and the traditional encoder-decoder architecture, are respectively decoupled. Such decoupling leverages the attention mechanism's advantage to capture long-term and long-range correlations.From extensive experiments on four real-world datasets, our work proves better predictive performance and efficiency than state-of-the-art attention-based models. Two case studies further show the distinct real effects.
The inherent complexity of real-world time series data, combined with the cost and infeasibility of manual labeling, presents considerable challenges to time series representation learning. Most existing studies tend to utilize data augmentation techniques to construct positive and negative samples and leverage a comparative learning framework to generate time series representations. However, they typically employ simple data augmentation techniques, such as jitter and cropping, to construct positive samples while randomly selecting irrelevant samples as negative ones, which are easily distinguished and unable to guide comparative learning to capture subtle discriminative features. Furthermore, they usually employ only a single positive sample for comparative learning, which is insufficient to model the diversity and hurts the robustness. To address these issues, this paper proposes a Time Series representation learning framework via Dual Reference Contrasting (TS-DRC). Specifically, we first utilize Markov transition field or Gramian angular field to transform the anchor sample of time series into image representations, which are adopted as positive samples. Then, we incorporate two positive samples (dual references) and one negative sample into the comparative learning framework, and devise a novel optimization objective to guide the model to capture more discriminate features, mitigate overfitting, and enhance the robustness. Extensive experiments conducted on four public real-world datasets demonstrate that our TS-DRC outperforms other state-of-the-art baselines.Our code is available at: https://github.com/yurui12138/TS-DRC.
Training social event detection models through federated learning (FedSED) aims to improve participants' performance on the task. However, existing federated learning paradigms are inadequate for achieving FedSED's objective and exhibit limitations in handling the inherent heterogeneity in social data. This paper proposes a personalized federated learning framework with a dual aggregation mechanism for social event detection, namely DAMe. We present a novel local aggregation strategy utilizing Bayesian optimization to incorporate global knowledge while retaining local characteristics. Moreover, we introduce a global aggregation strategy to provide clients with maximum external knowledge of their preferences. In addition, we incorporate a global-local event-centric constraint to prevent local overfitting and ''client-drift''. Experiments within a realistic simulation of a natural federated setting, utilizing six social event datasets spanning six languages and two social media platforms, along with an ablation study, have demonstrated the effectiveness of the proposed framework. Further robustness analyses have shown that DAMe is resistant to injection attacks.
In recent years, there has been a trend in the field of recommender systems towards multi-task modeling and multi-scenario modeling. The aim is to enhance the performance of various tasks and scenarios by jointly training on multiple tasks or scenarios to learn common patterns and features. Joint modeling of tasks and scenarios has also received widespread attention recently. However, despite the rich proposals of methods for Multi-Task Learning (MTL), Multi-Scenario Learning (MSL), and Multi-Task-Multi-Scenario Learning (MTMSL) in recent years, there still lacks a comprehensive benchmark to evaluate these methods. Previous studies often employed different datasets, data processing techniques, data partitioning strategies, and hyperparameter settings, making replication of existing research and fair comparison of experimental results challenging. To address this challenge, we introduce MMLRec, the first unified comprehensive benchmark for evaluating MTL, MSL and MTMSL, featuring consistent dataset processing and identical parameter settings. This benchmark implements a range of MTL, MSL, and MTMSL algorithms, and evaluates them on multiple commonly used recommender systems datasets. Through fair comparative experiments, we find that some structurally simplistic recommendation algorithms are underestimated, as they can achieve comparable results to more complex algorithms while maintaining lower complexity. Furthermore, our experimental analysis indicates that more complex methods exhibit better robustness when there are significant differences between tasks or scenarios. By providing a unified framework (MMLRec), our goal is to promote rapid evaluation and inspire innovative research in this continuously evolving field. We hope that our open-source benchmark can facilitate swift, equitable evaluations, while also fostering further breakthrough research in the domains of MTL, MSL, and MTMSL.
Bayesian network (BN) is a directed acyclic graph (DAG) representing the dependence relations among random variables with conditional probability tables (CPTs). The efficiency and accuracy of multiple probabilistic inferences in BN could not be guaranteed by most of the existing approximate inference methods. To address this issue, we propose the methods of Transformer based BN embedding (TBNE) and TBNE based probabilistic inferences. Specifically, we first adopt mutual information to measure the weight of parent-child node pairs and transform BN into multiple bidirectional weighted graphs (BWGs), while preserving the DAG and CPTs. Then, we redesign the Transformer model by incorporating the node importance and shortest path encodings, and extend the self-attention module of Transformer to generate node embeddings of BWGs. Following, we cast the probabilistic inference as the decoding information maximization of the path in BN from the perspective of information theory. Finally, we give an efficient algorithm for multiple probabilistic inferences by calculating embedding similarities between evidence and query nodes in BN. Experimental results show that our inference method is more efficient than the state-of-the-art competitors by several orders of magnitude while maintaining almost the same results.
Knowledge graphs have proven vital for efficient data management, enhanced search capabilities, and improved decision-making in various information technology domains. However, constructing reliable knowledge graphs in decentralized ecosystems, with distributed autonomous actors, poses significant challenges related to asynchronous transmission, out-of-order knowledge-sharing, device heterogeneity, and trust issues. These challenges are also present in resource orchestration within multi-cloud edge ecosystems where multiple stakeholders must collaborate and share information to enable next-gen smart applications. In this paper, we propose a novel system design that utilizes Distributed Ledger Technology to build knowledge graphs. This approach ensures consistent and trustworthy knowledge sharing among orchestrators in a cloud-edge continuum. Our solution accommodates diverse requirements of both cloud and edge servers, allowing clients to construct complete historic graphs or build filtered sub-graphs. We deploy our solution in a multi-cloud edge environment and construct knowledge graphs representing the system state, including clusters, servers, microservices, and various resources. We validate the feasibility and performance of our solution through a real-world deployment and experiments in a smart shopping use case. Results demonstrate that the proposed solution achieves the claimed benefits with minimal or acceptable delays in comparison to traditional event streaming services.
Automatic taxonomy induction is crucial for web search, recommendation systems, and question answering. Manual curation of taxonomies is expensive in terms of human effort, making automatic taxonomy construction highly desirable. In this work, we introduce Chain-of-Layer which is an in-context learning framework designed to induct taxonomies from a given set of entities. Chain-of-Layer breaks down the task into selecting relevant candidate entities in each layer and gradually building the taxonomy from top to bottom. To minimize errors, we introduce the Ensemble-based Ranking Filter to reduce the hallucinated content generated at each iteration. Through extensive experiments, we demonstrate that Chain-of-Layer achieves state-of-the-art performance on four real-world benchmarks. Source code available at: https://github.com/qingkaizeng/chain-of-layer.
Temporal knowledge graph alignment (TKGA) discovers the equivalent elements among heterogeneous temporal knowledge graphs (TKGs), and thus can increase the coverage of a given TKG. However, existing TKGA datasets fail to mirror the real-life challenges, and the oversimplified scenarios may even impede the fair comparison and development of the alignment solutions. To address the aforementioned issues, in this work, we propose to benchmark challenges for temporal knowledge graph alignment by establishing a new dataset, i.e., BETA, which features multi-granular temporal information, more realistic quadruple distribution, and new challenging alignment scenarios. Furthermore, we also offer a simple yet effective solution, MGTEA, to address the aforementioned challenges, which effectively models the complex structural and multi-granular temporal features to facilitate the alignment. Extensive experiments reveal that BETA indeed better mirrors the real-life challenges, and there is still room for developing more advanced solutions to address these difficulties, despite of the superior performance achieved by MGTEA.
Multimodal knowledge bases (MMKBs) provide cross-modal aligned knowledge crucial for multimodal tasks. However, the images in existing MMKBs are generally collected for entities in encyclopedia knowledge graphs. Therefore, detailed groundings of visual semantics with linguistic concepts are lacking, which are essential for the visual concept cognition ability of multimodal models. Addressing this gap, we introduce M2 ConceptBase, the first concept-centric MMKB. M2 ConceptBase models concepts as nodes with associated images and detailed textual descriptions. We propose a context-aware multimodal symbol grounding approach to align concept-image and concept-description pairs using context information from image-text datasets. Comprising 951K images and 152K concepts, M2 ConceptBase links each concept to an average of 6.27 images and a single description, ensuring comprehensive visual and textual semantics. Human studies confirm more than 95% alignment accuracy, underscoring its quality. Additionally, our experiments demonstrate that M2 ConceptBase significantly enhances VQA model performance on the OK-VQA task. M2 ConceptBase also substantially improves the fine-grained concept understanding capabilities of multimodal large language models through retrieval augmentation in two concept-related tasks, highlighting its value.
The generation of medical dialogue notes is essential in healthcare, providing a structured recapitalization of patient-provider interactions. Medical notes are rigorously organized into various sections, including Chief Complaint, History of Present Illness and more. Each section serves a specific purpose to record detailed medical content. Traditionally, this task is labor-intensive, requiring physicians to manually create notes, a process prone to errors. With advancements in AI, it is now feasible to automate the generation of medical notes. There are mainly two categories of methods for automatic medical note generation. Pre-trained language models (PLMs) struggle with unstructured outputs, limited datasets, and inadequate medical terminology. In-context learning (ICL) methods improve accuracy and reduce data requirements but still produce unstructured notes and require high time and cost. To tackle the above challenges, we propose a three-module framework, called CE-DEPT, for accurate, efficient and cost-effective medical note generation. Specifically, the Task Decomposition Module breaks down complete medical dialogues into section-specific dialogues to ensure relevance and accuracy. The Batch Combination Module groups these sections into batches based on disease similarity to reduce costs and improve efficiency. The Note Generation Module employs batch prompting with ICL to generate each section note, followed by combining them into a structured, comprehensive medical note. Experiments on benchmark datasets demonstrated the effectiveness of Task Decomposition and Batch Prompting. Our method, CE-DEPT outperforms the best method by 5% on the ROUGE-1 score, 3% on the Bertscore-F1, a cost-effectiveness improvement of 15%, and a reduction in time consumption of 25% at peak accuracy.
Existing Conversational Recommender Systems (CRS) predominantly utilize user simulators for training and evaluating recommendation policies. These simulators often oversimplify the complexity of user interactions by focusing solely on static item attributes, neglecting the rich, evolving preferences that characterize real-world user behavior. This limitation frequently leads to models that perform well in simulated environments but falter in actual deployment. Addressing these challenges, this paper introduces the Tri-Phase Offline Policy Learning-based Conversational Recommender System (TPCRS), which significantly reduces dependency on real-time interactions and mitigates overfitting issues prevalent in traditional approaches. TPCRS integrates a model-based offline learning strategy with a controllable user simulation that dynamically aligns with both personalized and evolving user preferences. Through comprehensive experiments, TPCRS demonstrates enhanced robustness, adaptability, and accuracy in recommendations, outperforming traditional CRS models in diverse user scenarios. This approach not only provides a more realistic evaluation environment but also facilitates a deeper understanding of user behavior dynamics, thereby refining the recommendation process.
This paper introduces OrthoReg, a simple yet effective Graph-regularized MLP model for semi-supervised node representation learning. We first demonstrate, through empirical observations and theoretical analysis, that node embeddings learned from conventional GR-MLPs suffer from the over-correlation issue. This issue arises when a few dominant singular values overwhelm the embedding space, leading to the limited expressive power of the learned node representations. To mitigate this problem, we propose a novel GR-MLP model called OrthoReg. By incorporating a soft regularization loss on the correlation matrix of node embeddings, OrthoReg explicitly encourages orthogonal node representations, effectively avoiding over-correlated representations. Compared to the currently popular GNN models, our OrthoReg possesses two distinct advantages: 1) Much faster inference speed, particularly for large-scale graphs. 2) Significantly superior performance in inductive cold-start settings. Experiments on semi-supervised node classification tasks, together with the extensive ablation studies, have demonstrated the effectiveness of the proposed designs.
We introduce InfoMLP, an innovative model structured like a Multilayer Perceptron (MLP) for semi-supervised classification of structured data, e.g., graphs. InfoMLP was inspired by our observation that overlapping information between node features and the structure between data points significantly influences the performance gap between feature-only MLPs and advanced graph-based semi-supervised methods, e.g., GNNs. To quantify the overlapping information, we first introduce a tractable metric to quantify the mutual information between node features and graph structure. Based on this, we propose InfoMLP, which seeks to maximize the mutual information between node embeddings derived from the MLP and the structure information. Our info-max objective is split into two sub-objectives: the first is a non-parametric preprocessing step aiming to find the optimal graph-augmented node feature matrix that captures the maximal information about the graph structure, while the second sub-objective is to maximize the mutual information between node embeddings generated from the original node features and those from the graph-augmented node features. Since the message-passing operation is integrated into the preprocessing step, requiring only a single execution per dataset, InfoMLP maintains the same efficiency as a vanilla MLP during both training and testing. We validate the efficacy of our design through experiments on real-world datasets of varying scales supplemented by comprehensive ablation studies. Our results corroborate our analysis and demonstrate the effectiveness of our novel approach.
In real-world recommender systems, user engagement and subjective feedback play pivotal roles in shaping the content distribution mechanism of the platform. When platforms reach a certain scale, they often gather valuable questionnaire feedback data from users to evaluate their satisfaction with recommended items. Compared to traditional user feedback such as likes, questionnaires explicitly capture both satisfaction and dissatisfaction and are unaffected by other users' questionnaires, thus better expressing users' true preferences. In this paper, we aim to leverage the questionnaire feedback to align the recommendation model with users' true preferences. However, due to the platform distribution mechanism and divergent user attitudes toward questionnaires, the questionnaire feedback data frequently becomes sparse and exhibits selection biases, resulting in challenges in feature integration and training process. To address these issues, we introduce a novel user Satisfaction Alignment framework that effectively leverages Questionnaire feedback to enhance Recommendation, named SAQRec. SAQRec begins by training an unbiased satisfaction model to impute satisfaction, addressing selection bias and data sparsity. Then, SAQRec aligns features with users' true preferences by disentangling satisfaction and dissatisfaction from click history and categorizing clicked items into multiple satisfaction levels through the imputed satisfactions. Additionally, the imputed satisfactions from the pre-trained unbiased satisfaction model serve as pseudo-labels to align the model's outputs with users' true preferences. Extensive experiments on both public and commercial datasets demonstrate SAQRec's superior integration of questionnaire feedback in recommendation models. Online A/B testing on a short video platform confirms its effectiveness in boosting user watch time and positive-to-negative feedback ratio, enhancing overall performance and user satisfaction.
Federated recommender systems are used to address privacy issues in recommendations. Among them, FedVAE extends the representative non-linear recommendation method MultVAE. However, the bottleneck of FedVAE lies in its communication load during training, as the parameter volume of its first and last layers is correlated with the number of items. This leads to significant communication cost during the model's transmission phases (distribution and upload), making FedVAE's implementation extremely challenging. To address these challenges, we propose an Efficient Federated Variational AutoEncoder for collaborative filtering, EFVAE, which core is the Federated Collaborative Importance Sampling (FCIS) method. FCIS reduces communication costs through a client-to-server collaborative sampling mechanism and provides satisfactory recommendation performance through dynamic multi-stage approximation of the decoding distribution. Extensive experiments and analyses on real-world datasets confirm that EFVAE significantly reduces communication costs by up to 94.51% while maintaining the recommendation performance. Moreover, its recommendation performance is better on sparse datasets, with improvements reaching up to 13.79%.
User-item interaction data in collaborative filtering and graph modeling tasks often exhibit power-law characteristics, which suggest the suitability of hyperbolic space modeling. Hyperbolic Graph Convolution Neural Networks (HGCNs) are a novel technique that leverages the advantages of GCN and hyperbolic space, and then achieves remarkable results. However, existing HGCN methods have several drawbacks: they fail to fully leverage hyperbolic space properties due to arbitrary embedding initialization and imprecise tangent space aggregation; they overlook auxiliary information that could enrich the collaborative graph; and their training convergence is slow due to margin ranking loss and random negative sampling. To overcome these challenges, we propose Hyperbolic Graph Collaborative for Heterogeneous Recommendation (HGCH), an enhanced HGCN-based model for collaborative filtering that integrates diverse side information into a heterogeneous collaborative graph and improves training convergence speed. HGCH first preserves the long-tailed nature of the graph by initializing node embeddings with power law prior; then it aggregates neighbors in hyperbolic space using the gyromidpoint method for accurate computation; finally, it fuses multiple embeddings from different hyperbolic spaces by the gate fusion with prior. Moreover, HGCH employs a hyperbolic user-specific negative sampling to speed up convergence. We evaluate HGCH on four real datasets, and the results show that HGCH achieves competitive results and outperforms leading baselines, including HGCNs. Extensive ablation studies further confirm its effectiveness.
Knowledge graphs constantly evolve with new entities emerging, existing definitions being revised, and entity relationships changing. These changes lead to temporal degradation in entity linking models, characterized as a decline in model performance over time. To address this issue, we propose leveraging graph relationships to aggregate information from neighboring entities across different time periods. This approach enhances the ability to distinguish similar entities over time, thereby minimizing the impact of temporal degradation. We introduce CYCLE: Cross-Year Contrastive Learning for Entity-Linking. This model employs a novel graph contrastive learning method to tackle temporal performance degradation in entity linking tasks. Our contrastive learning method treats newly added graph relationships as positive samples and newly removed ones as negative samples. This approach helps our model effectively prevent temporal degradation, achieving a 13.90% performance improvement over the state-of-the-art from 2023 when the time gap is one year, and a 17.79% improvement as the gap expands to three years. Further analysis shows that CYCLE is particularly robust for low-degree entities, which are less resistant to temporal degradation due to their sparse connectivity, making them particularly suitable for our method. The code and data are made available at https://github.com/pengyu-zhang/CYCLE-Cross-Year-Contrastive-Learning-in-Entity-Linking
Despite the notable progress achieved by large-scale vision-language pre-training models in a wide range of multi-modal tasks, their performance often falls short in image-text matching challenges that require an in-depth understanding of structured representations. For instance, when distinguishing between texts or images that are generally similar but have distinct structured knowledge (such as entities and relationships in text, or objects and object attributes in images), the model's capabilities are limited. In this paper, we propose a advancing Multi-modal Structured Knowledge Representation with synergistic hard negative samples (MSKR), thereby significantly improving the model's matching capability for such data. Specifically, our model comprises a structured knowledge-enhanced encoder designed to bolster the structured knowledge inherent in textual data, such as entities, their attributes, and the relationships among these entities as well as structured knowledge within images, focusing on elements like objects and their attributes. To further refine the model's learning process, we produce both image and text challenging negative samples. Extensive experimental evaluations on the Winoground, InpaintCOCO, and MSCOCO benchmark reveal that MSKR significantly outperforms the baseline model, showcasing marked improvements 2.66% on average in structured representation learning compared to the baseline. Moreover, general representation results illustrate that our model not only excels in structured representation learning but also maintains its proficiency in general representation learning.
Recommender systems embody significant commercial value and represent crucial intellectual property. However, the integrity of these systems is constantly challenged by malicious actors seeking to steal their underlying models. Safeguarding against such threats is paramount to upholding the rights and interests of the model owner. While model watermarking has emerged as a potent defense mechanism in various domains, its direct application to recommender systems remains unexplored and non-trivial. In this paper, we address this gap by introducing Autoregressive Out-of-distribution Watermarking (AOW), a novel technique tailored specifically for recommender systems. Our approach entails selecting an initial item and querying it through the oracle model, followed by the selection of subsequent items with small prediction scores. This iterative process generates a watermark sequence autoregressively, which is then ingrained into the model's memory through training. To assess the efficacy of the watermark, the model is tasked with predicting the subsequent item given a truncated watermark sequence. Through extensive experimentation and analysis, we demonstrate the superior performance and robust properties of AOW. Notably, our watermarking technique exhibits high-confidence extraction capabilities and maintains effectiveness even in the face of distillation and fine-tuning processes.
Temporal Knowledge Graph (TKG) extrapolation aims to predict future missing facts based on historical information, which has exhibited both semantics and topology of events. The mainstream methods have advanced the prediction performance by exploring the potential of topology representations of TKGs based on dedicated temporal Graph Neural Networks (GNNs). Until recently, few Language Models (LM) based methods have attempted to model the semantic representations of TKGs, however, lacking specific designs for the topology information. Therefore, we propose a Semantic TOpology REpresentation learning (STORE) framework enhanced by LMs to bridge the gap between the semantics and topology of TKGs. Firstly, we tackle the challenge of long historical facts modeling by a time-aware sampling based on semantic priors to extract concise yet precise facts. Secondly, we handle the challenge of the interaction between topology and semantics by transforming graph representations into virtual tokens that are then integrated with generated prompts and fed into LMs. Finally, multi-head attention is adopted to obtain better semantic topology representations, thereby achieving joint optimization of both temporal GNNs and LMs. Extensive experiments on five datasets show that our STORE outperforms state-of-the-art GNNs- and LM-based methods.
Data imputation is a crucial task due to the widespread occurrence of missing data. Many methods adopt a two-step approach: initially crafting a preliminary imputation (the "draft") and then refining it to produce the final missing data imputation result, commonly referred to as "draft-then-refine". In our study, we examine this prevalent strategy through the lens of graph Dirichlet energy. We observe that a basic "draft" imputation tends to decrease the Dirichlet energy. Therefore, a subsequent "refine" step is necessary to restore the overall energy balance. Existing refinement techniques, such as the Graph Convolutional Network (GCN), often result in further energy reduction. To address this, we introduce a new framework, the Graph Laplacian Pyramid Network (GLPN). GLPN incorporates a U-shaped autoencoder and residual networks to capture both global and local details effectively. Through extensive experiments on multiple real-world datasets, GLPN consistently outperforms state-of-the-art methods across three different missing data mechanisms. The code is available at https://github.com/liguanlue/GLPN.
The efficiency and scalability of graph convolution networks (GCNs) in training recommender systems (RecSys) have been persistent concerns, hindering their deployment in real-world applications. This paper presents a critical examination of the necessity of graph convolutions during the training phase and introduces an innovative alternative: the Light Post-Training Graph Ordinary-Differential-Equation (LightGODE). Our investigation reveals that the benefits of GCNs are more pronounced during testing rather than training. Motivated by this, LightGODE utilizes a novel post-training graph convolution method that bypasses the computation-intensive message passing of GCNs and employs a non-parametric continuous graph ordinary-differential-equation (ODE) to dynamically model node representations. This approach drastically reduces training time while achieving fine-grained post-training graph convolution to avoid the distortion of the original training embedding space, termed the embedding discrepancy issue. We validate our model across several real-world datasets of different scales, demonstrating that LightGODE not only outperforms GCN-based models in terms of efficiency and effectiveness but also significantly mitigates the embedding discrepancy commonly associated with deeper graph convolution layers. Our LightGODE challenges the prevailing paradigms in RecSys training and suggests re-evaluating the role of graph convolutions, potentially guiding future developments of efficient large-scale graph-based RecSys.
Graph data has been widely applied due to its powerful expressive capabilities. The release of raw graph data without preprocessing may lead to privacy information leakage. Thus, generating privacy-protected graphs is necessary for data analysis. Current privacy protection methods in graphs focus on securing attributes like degree distribution, triangle counts, and node information, but they often overlook the need to protect user group relationships. Additionally, some privacy-preserving graph publishing methods introduce significant noise due to the chosen graph generation techniques and the points at which noise is added. This paper aims to propose an effective graph synthesis algorithm by using differential privacy named DPCAG (Differentially Private Community Affiliation Graph Generation Model) for protecting user group relationships. Firstly, it is observed that there are numerous small probabilities in the adjacency matrix D generated by the affiliation matrix F, directly utilizing it to construct graph G would result in the generation of a substantial number of redundant edges. Therefore, we introduce a generating threshold theta to filter out unnecessary edges. Secondly, to achieve a better balance between data availability and the level of privacy protection, two budget allocation schemes are designed based on the introduction of k-truss to describe the tightness of group relationships. Lastly, we demonstrate the proposed model satisfies differential privacy mathematically and the effectiveness of DPCAG is validated using four real graph datasets.
Offline reinforcement learning (RL) is an effective tool for real-world recommender systems with its capacity to model the dynamic interest of users and its interactive nature. Most existing offline RL recommender systems focus on model-based RL through learning a world model from offline data and building the recommendation policy by interacting with this model. Although these methods have made progress in the recommendation performance, the effectiveness of model-based offline RL methods is often constrained by the accuracy of the estimation of the reward model and the model uncertainties, primarily due to the extreme discrepancy between offline logged data and real-world data in user interactions with online platforms. To fill this gap, a more accurate reward model and uncertainty estimation are needed for the model-based RL methods. In this paper, a novel model-based Reward Shaping in Offline Reinforcement Learning for Recommender Systems, ROLeR, is proposed for reward and uncertainty estimation in recommendation systems. Specifically, a non-parametric reward shaping method is designed to refine the reward model. In addition, a flexible and more representative uncertainty penalty is designed to fit the needs of recommendation systems. Extensive experiments conducted on four benchmark datasets showcase that ROLeR achieves state-of-the-art performance compared with existing baselines. Source code can be downloaded at this address.
Food recommendation systems play a pivotal role in shaping dietary salubrity and fostering sustainable lifestyles by recommending recipes and foodstuffs that align with user preferences. Metadata information of a recipe, encompassing multi-modal descriptions, constituent ingredients, and health-related attributes, can furnish a more holistic perspective on the recipe's profile, thereby augmenting recommendation performance. However, existing state-of-the-art methods often overlook the inherent interdependencies between modalities, ingredients, and health factors, leaving the health information pertaining to recipe characteristics underexploited. Notably, our preliminary investigation on two datasets unveiled that the semantic divergence between health-related knowledge and collaborative filtering signals is more pronounced in comparison to other metadata information, thereby potentially impeding the efficacy of food recommendation systems. To address these limitations, we propose HealthRec, a novel multi-modal food recommendation framework with health-aware knowledge distillation. HealthRec employs a global graph representation learning module to capture high-order dependencies across diverse food-related relations, enriching the representations. Subsequently, a co-attention network is leveraged to capture local, recipe-level knowledge transfer between modality-related and ingredient-related embeddings. Additionally, we exploit external supervision signals derived from WHO recommendations, utilizing knowledge distillation during the training phase to transfer local health-aware knowledge into global collaborative embeddings. Extensive experimentation on real-world datasets demonstrates HealthRec's superiority compared to current state-of-the-art recommendation baselines, highlighting its effectiveness in modeling health-aware food recommendations.
Cross-domain recommendation (CDR) aims to suggest items from new domains that align with potential user preferences, based on their historical interactions. Existing methods primarily focus on acquiring item representations by discovering user preferences under specific, yet possibly redundant, item features. However, user preferences may be more strongly associated with interacted items at higher semantic levels, rather than specific item features. Consequently, this item feature-focused recommendation approach can easily become suboptimal or even obsolete when conducting CDR with disturbances of these redundant features. In this paper, we propose a novel Preference Prototype-Aware (PPA) learning method to quantitatively learn user preferences while minimizing disturbances from the source domain. The PPA framework consists of two complementary components: a mix-encoder and a preference prototype-aware decoder, forming an end-to-end unified framework suitable for various real-world scenarios. The mix-encoder employs a mix-network to learn better general representations of interacted items and capture the intrinsic relationships between items across different domains. The preference prototype-aware decoder implements a learnable prototype matching mechanism to quantitatively perceive user preferences, which can accurately capture user preferences at a higher semantic level. This decoder can also avoid disturbances caused by item features from the source domain. The experimental results on public benchmark datasets in different scenarios demonstrate the superiority of the proposed PPA learning method compared to state-of-the-art counterparts. PPA excels not only in providing accurate recommendations but also in offering reliable preference prototypes. Our code is available at https://github.com/zyx-nuaa/PPA-for-CDR.
Recent studies on knowledge graph question answering (KGQA) have focused on tackling complex inquiries to enhance the applicability of models in real-life settings. Unfortunately, KGQA models encounter significant challenges due to the lack of high-quality annotated data, making it difficult to accurately answer the diverse range of complex natural language questions posed by users. Inspired by the recent success of Large Language Models (LLMs), the burden associated with manual annotation can be mitigated by utilizing LLMs. However, the data generated directly by LLMs may exhibit a potential distribution discrepancy with real user queries. In this paper, we present an enhancement framework that utilizes Generative Adversarial Imitation Learning (GAIL) to fine-tune LLMs, which can address the challenges inherent in the low-resource KGQA task. Specifically, based on GAIL, the LLMs act as the generator aiming to output samples resembling expert demonstrations. Meanwhile, we utilize a paired discriminator to assess the authenticity of generated sequences and their relevance to the input SPARQL queries. Additionally, proximal policy optimization is leveraged to stabilize the training of the generator. Furthermore, we employ an automated algorithm to controllably sample various SPARQL queries from the knowledge graph, subsequently transforming them into corresponding natural language questions using fine-tuned LLMs. The synthetic dataset can serve as supplementary data for training lightweight KGQA models in real-world scenarios. Experimental results on the WebQuestionsSP, ComplexWebQuestions, and GrailQA show that our framework achieves state-of-the-art performance in a low-resource setting, even approaching the performance of supervised models.
As real-world graph data continues to grow larger and larger, training large graphs in a distributed environment is becoming increasingly prevalent. However, network transmission in a distributed environment can hinder subsequent training steps, resulting in suboptimal training performance. After conducting a comprehensive analysis and experimental demonstration, we have discovered that during the training process, there exist certain data that can be computed once and reused multiple times. In addition, we also found that after a certain iterations of training, the parameter updates during each iteration had minimal effect on the parameters. Based on these findings, we have improved the original implementation and proposed a cache-enhanced distributed graph training system, NeutronCache. It utilizes cached reusable intermediate data and a dynamically adjusted stale embedding reuse strategy, reducing network overhead in distributed systems and accelerating the training process. Through experimental validation, our implementation achieved acceleration ranging from 1.4X to 16.61X on real graph datasets with almost no loss in accuracy.
Visual language navigation (VLN) is one of the important research in embodied AI. It aims to enable an agent to understand the surrounding environment and complete navigation tasks. VLN instructions could be categorized into coarse-grained and fine-grained commands. Fine-grained command describes a whole task with subtasks step-by-step. In contrast, coarse-grained command gives an abstract task description, which more suites human habits. Most existing work focuses on the former kind of instruction in VLN tasks, ignoring the latter abstract instructions belonging to daily life scenarios. To overcome the above challenge in abstract instruction, we attempt to consider coarse-grained instruction in VLN by event knowledge enhancement. Specifically, we first propose a prompt-based framework to extract an event knowledge graph (named VLN-EventKG =) for VLN integrally over multiple mainstream benchmark datasets. Through small and large language model collaboration, we realize knowledge-enhanced navigation planning (named EventNav) for VLN tasks with coarse-grained instruction input. Additionally, we design a novel dynamic history backtracking module to correct potential error action planning in real time. Experimental results in various public benchmarks show our knowledge-enhanced method has superiority in coarse-grained-instruction VLN using our proposed VLN-EventKG with over 5% improvement in success rate. Our project is available at https://sites.google.com/view/vln-eventkg
Shapley value attribution (SVA) is an increasingly popular Explainable AI (XAI) approach that has been widely used in many recent applied studies to gain new insights into the underlying information systems. However, most existing SVA methods are error-prone, providing biased or unreliable explanations that fail to correctly capture the informational dependencies between features and model outputs. These explanation errors can be decomposed into two components: 1) observation bias which stems from data sparsity and leads to over-informativeness; and 2) structural bias which stems from distributional assumptions and leads to under-informativeness. To alleviate these biases, in this paper, we propose a series of refinement methods that combine out-of-distribution (OOD) detection and importance sampling. In essence, our methods aim to rectify the distribution drift caused by distributional assumptions. We apply our refinement methods to two popular SVAs: the marginal SVA and the surrogate model-based SVA. Our extensive experiments show that the proposed methods significantly enhance the informativeness of both local and global Shapley value-based explanations.
Knowledge Graph Question Generation (KGQG) is the task of generating natural language questions based on the given knowledge graph (KG). Although extensively explored in recent years, prevailing models predominantly depend on labelled data for training deep learning models or employ large parametric frameworks, e.g., Large Language Models (LLMs), which can incur significant deployment costs and pose practical implementation challenges. To address these issues, in this work, we put forward a zero-shot, multi-agent KGQG framework. This framework integrates the capabilities of LLMs with small models to facilitate cost-effective, high-quality question generation. In specific, we develop a professional editorial team architecture accompanied by two workflow optimization tools to reduce unproductive collaboration among LLMs-based agents and enhance the robustness of the system. Extensive experiments demonstrate that our proposed framework derives the new state-of-the-art performance on the zero-shot KGQG tasks, with relative gains of 20.24% and 13.57% on two KGQG datasets, respectively, which rival fully supervised state-of-the-art models.
Time series forecasting is a critical and challenging task in practical application. Recent advancements in pre-trained foundation models for time series forecasting have gained significant interest. However, current methods often overlook the multi-scale nature of time series, which is essential for accurate forecasting. To address this, we propose HiMTM, a hierarchical multi-scale masked time series modeling with self-distillation for long-term forecasting. HiMTM integrates four key components: (1) hierarchical multi-scale transformer (HMT) to capture temporal information at different scales; (2) decoupled encoder-decoder (DED) that directs the encoder towards feature extraction while the decoder focuses on pretext tasks; (3) hierarchical self-distillation (HSD) for multi-stage feature-level supervision signals during pre-training; and (4) cross-scale attention fine-tuning (CSA-FT) to capture dependencies between different scales for downstream tasks. These components collectively enhance multi-scale feature extraction in masked time series modeling, improving forecasting accuracy. Extensive experiments on seven mainstream datasets show that HiMTM surpasses state-of-the-art self-supervised and end-to-end learning methods by a considerable margin of 3.16-68.54%. Additionally, HiMTM outperforms the latest robust self-supervised learning method, PatchTST, in cross-domain forecasting by a significant margin of 2.3%. The effectiveness of HiMTM is further demonstrated through its application in natural gas demand forecasting.
Knowledge Tracing (KT) and Behavior Modeling (BM) are essential mining and discovery problems in education. KT models student knowledge based on prior performance with learning materials, while BM focuses on patterns such as student preferences, engagement, and procrastination. Traditional research in these areas focuses on each task individually, thereby overlooking their interconnections. However, recent research on multi-activity knowledge tracing suggests that student preferences for learning materials are key to understanding student learning. In this paper, we propose a novel multi-task model, the Multi-Task Student Knowledge and Behavior Model (KTBM), which combines KT and BM to improve both performance and interoperability. KTBM includes a multi-activity KT component and a preference behavior component while enabling robust information transfer between them. We conceptualize this approach as a multi-task learning problem with two objectives: predicting students' performance and their choices concerning learning material types. To address this dual-objective challenge, we employ a Pareto multi-task learning optimization algorithm. Our experiments on three real-world datasets show that KTBM significantly enhances both KT and BM performance, demonstrating improvement across various settings and providing interpretable results.
Providing natural language-based explanations to justify recommendations helps to improve users' satisfaction and gain users' trust. However, as current explanation generation methods are commonly trained with an objective to mimic existing user reviews, the generated explanations are often not aligned with the predicted ratings or some important features of the recommended items, and thus, are suboptimal in helping users make informed decision on the recommendation platform. To tackle this problem, we propose a flexible model-agnostic method named MMI (Maximizing Mutual Information) framework to enhance the alignment between the generated natural language explanations and the predicted rating/important item features. Specifically, we propose to use mutual information (MI) as a measure for the alignment and train a neural MI estimator. Then, we treat a well-trained explanation generation model as the backbone model and further fine-tune it through reinforcement learning with guidance from the MI estimator, which rewards a generated explanation that is more aligned with the predicted rating or a pre-defined feature of the recommended item. Experiments on three datasets demonstrate that our MMI framework can boost different backbone models, enabling them to outperform existing baselines in terms of alignment with predicted ratings and item features. Additionally, user studies verify that MI-enhanced explanations indeed facilitate users' decisions and are favorable compared with other baselines due to their better alignment properties.
Generating clarifying questions can effectively clarify users' complicated search intent in conversational search systems. However, existing methods based on pre-defined templates are inadequate in understanding explicit user intents, making generated questions monotonous or inaccurate in some cases. In this paper, we define the ''intent'' of a query as a verb representing the potential behavior, action, or task the user may take. We study generating clarifying questions from a new perspective by incorporating the intents explicitly to form ''intent-aware'' questions with high informativeness and accuracy. Since obtaining gold intent-aware questions is expensive, we propose a rule-based method and a continual learning model to generate intent-aware questions as weak supervision signals. The former leverages search results to mine contextual intent-aware words or phrases, and the latter relies on parallel corpora to paraphrase template-based questions by incorporating the intents. The generated weak supervision data are then applied to fine-tune a BART-based model for end-to-end intent-aware question generation. We also explore to prompt a large language model to generate intent-aware questions. Experimental results on a public clarification dataset demonstrate that our proposed methods improve users' search experience compared to existing methods.
Drug-drug interaction (DDI) identification is a crucial aspect of pharmacology research. There are many DDI types (hundreds), and they are not evenly distributed with equal chance to occur. Some of the rarely occurred DDI types are often high risk and could be life-critical if overlooked, exemplifying the long-tailed distribution problem. Existing models falter against this distribution challenge and overlook the multi-faceted nature of drugs in DDI prediction. In this paper, a novel multi-modal deep learning-based framework, namely TFDM, is introduced to leverage multiple properties of a drug to achieve DDI classification. The proposed framework fuses multimodal features of drugs, including graph-based, molecular structure, Target and Enzyme, for DDI identification. To tackle the challenge posed by the distribution skewness across categories, a novel loss function called Tailed Focal Loss is introduced, aimed at further enhancing the model performance and address gradient vanishing problem of focal loss in extremely long-tailed dataset. Intensive experiments over 4 challenging long-tailed dataset demonstrate that the TFMD outperforms the most recent SOTA methods in long-tailed DDI classification tasks. The source code is released to reproduce our experiment results: https://github.com/IcurasLW/TFMD_Longtailed_DDI.git
Irregular Time Series Data (IRTS) has shown increasing prevalence in real-world applications. We observed that IRTS can be divided into two specialized types: Natural Irregular Time Series (NIRTS) and Accidental Irregular Time Series (AIRTS). Various existing methods either ignore the impacts of irregular patterns or statically learn the irregular dynamics of NIRTS and AIRTS data and suffer from limited data availability due to the sparsity of IRTS. We proposed a novel transformer-based framework for general irregular time series data that treats IRTS from four views: Locality, Time, Spatio and Irregularity to motivate the data usage to the highest potential. Moreover, we design a sophisticated irregularity-gate mechanism to adaptively select task-relevant information from irregularity, which improves the generalization ability to various IRTS data. We implement extensive experiments to demonstrate the resistance of our work to three highly missing ratio datasets (88.4%, 94.9%, 60% missing value) and investigate the significance of the irregularity information for both NIRTS and AIRTS by additional ablation study. We release our implementation in https://github.com/IcurasLW/MTSFormer-Irregular_Time_Series.git.
Inferring the fine-grained urban flows based on the coarse-grained flow observations is practically important to many smart city-related applications. Adequate data is usually a prerequisite for existing machine learning methods, especially most deep learning models. However, many cities still suffer from the data scarcity issue due to the unbalanced city development levels. To mitigate this issue, we propose a novel cross-city fine-grained urban flow inference model named FGITrans, which aims to effectively transfer the knowledge from the data-rich cities to the data-scarce cities. Specifically, we design a weight-sharing triple-branch transformer framework which adopts self-attention and cross-attention for source/target city feature learning and domain alignment, respectively. Then, we propose a novel spatio-temporal adaptive embedding (STAE) layer for our transformer framework, and introduce a cross-city knowledge distillation (CKD) loss to narrow the cross-city disparities. The CKD loss explicitly enforces the framework to learn the discriminative domain-specific and domain-invariant representations simultaneously. Extensive experiments conducted on four large real-world datasets validate the effectiveness of FGITrans compared with the state-of-the-art baselines.
Inferring the fine-grained urban traffic flows based on the coarse-grained traffic flow observations is practically important to many real applications for smart city. Existing approaches mostly rely on a large number of high quality urban flow data, but neglect the data sparsity issue which is common in real-world scenarios. Therefore, the performance of existing methods may not be promising towards cities that lack sufficient traffic flow data. How to design a more generalizable urban flow inference model that is able to effectively transfer knowledge across multiple cities is challenging and remains as an open research problem. In this paper, we propose a novel fine-grained urban flow inference model named AdaTM, which leverages the city-specific and city-invariant knowledge extracted from multiple cities. Specifically, we first propose a transformer-based urban feature extraction network named UBFormer to comprehensively extract the spatial-temporal features of multiple source cities. Then, we incorporate a learnable integrator to fuse the city-invariant and city-specific feature representations for the target city with sparse traffic flow data. Finally, we construct the feature representation of the target city through adaptive feature fusion and infer its fine-grained urban flows through the designed urban flow upsampler. Extensive experiments conducted on four large real-world datasets demonstrate the effectiveness of our approach.
Recommender systems (RSs) are susceptible to Interaction-level Membership Inference Attacks (IMIAs), which aim to determine whether specific user-item interactions are present in the training data of the target RS. However, existing IMIAs struggle with inferring the membership of tail interactions, i.e., the interactions involving tail items, due to the limited information available about these items. This paper introduces MINER, a new IMIA designed to enhance attack performance against RSs with long-tailed item distribution. MINER addresses the information scarcity of tail items at both the feature and sample levels. At the feature level, MINER leverages the Knowledge Graphs (KGs) to obtain the auxiliary knowledge of tail items. At the sample level, MINER designs a Bilateral-Branch Network (BBN) as the attack model. The BBN trains two branches independently, with one branch trained on interaction samples with the original long-tailed item distribution and the other on interaction samples with a more balanced item distribution. The outputs of the two branches are aggregated using a cumulative learning component. Our experimental results demonstrate that MINER significantly enhances the attack accuracy of IMIA, especially for tail interactions. Beyond attack design, we design a defense mechanism named RGL to defend against MINER. Empirical evaluations demonstrate that RGL effectively mitigates the privacy risks posed by MINER while preserving recommendation accuracy. Our code is available at https://github.com/dzhong2/MINER.
Multi-modal transportation leverages the advantages of various transportation modes, leading to more efficient urban traveling services. Accurately predicting transfer times between different modes provides guidance for tasks such as trip planning and transportation management. Most existing transfer time prediction works rely on strong assumptions, e.g., predetermined routes, assumed speeds, and predefined downstream transportation timetables. However, these assumptions are hard to hold in practice due to internal factors like individual preferences and external factors like dynamic traffic conditions. These factors are dynamic and vary with location and time, presenting a significant challenge. To address this, we introduce an adaptive transfer time prediction framework, AdaTrans, to forecast personalized transfer times between upstream and downstream transportation modes. Firstly, an attribute learning module is designed to model the trends of internal factors. Then a spatial-temporal adaptive learning component is designed to learn dynamic external factors. Finally, an aggregation component with a capsule network is employed to fuse the influences of these factors. The extensive evaluation results in two real-world datasets demonstrate that AdaTrans effectively harnesses insights from internal and external factors, outperforming state-of-the-art methods by ~20%.
Visual language navigation is an exciting and challenging multi-modal task. Most existing research focuses on the fusion of visual features and semantic space, which ignoring the importance of local highlight features and semantic knowledge alignment in images for agent navigation. Therefore, this paper proposes a novel visual language model combining Knowledge-augmented Reasoning and Soft-Prompt (KRSP) learning. First, we perform fine-grained processing of local regions in the image and to map context image features and text knowledge to the same common sub-space. We focus on regional knowledge to increase the model reasoning ability. Next, soft-prompt learning aligns keywords and sub-visual information in instruction features to solve the path mismatch problem in coarse-grained instructions. We use a large-scale pre-training model CoCoOp to collect highly matched soft action prompts into a unified instruction set. Finally, we propose a general cross-modal feature alignment loss function. The potential semantic correlation between sub-visual information and instruction space is closer through the penalty mechanism of the alignment function. This paper verifies the method effectiveness on the R2R and REVERIE datasets, and the experimental results show that KRSP achieves state-of-the-art performance. Among them, the KRSP of SPL evaluation metric increased by 4.5% in unseen scenarios.
Textual adversarial attack in black-box scenarios is a challenging task, as only the predicted label is available, and the text space is discrete and non-differentiable. Current research in this area is still in its infancy and mostly focuses on untargeted attack, lacking the capability to control the labels of the generated adversarial examples. Meanwhile, existing textual adversarial attack methods primarily rely on word substitution operations to maintain semantic similarity between the adversarial and original examples, which greatly limits the search space for adversarial examples. To address these issues, we propose a novel <u>L</u>exical-<u>S</u>yntactic <u>T</u>argeted <u>A</u>dversarial <u>A</u>ttack method tailored for the black-box settings, referred to as LST2A. Our approach involves adversarial perturbations at different levels of granularities, i.e., word-level with word substitution operations and syntactic-level through rewriting the syntax of the examples. Specifically, we first embed the entire text into the embedding layer of a masked language model, and then optimize perturbations at the word level within the hidden state to generate adversarial examples with the target label. For examples that are difficult to attack successfully with only word-level perturbations at higher semantic similarity thresholds, we leverage Large Language Model (LLM) to introduce syntactic-level perturbations to these examples, making them more vulnerable to the decision boundary of the victim model. Subsequently, we re-optimize the word-level perturbations for these vulnerable examples. Extensive experiments and human evaluations demonstrate that our proposed method consistently outperforms the state-of-the-art baselines, crafting smoother, more grammatically correct adversarial examples.
Missing values are prevalent in multivariate time series, compromising the integrity of analyses and degrading the performance of downstream tasks. Consequently, research has focused on multivariate time series imputation, aiming to accurately impute the missing values based on available observations. A key research question is how to ensure imputation consistency, i.e., intra-consistency between observed and imputed values, and inter-consistency between adjacent windows after imputation. However, previous methods rely solely on the inductive bias of the imputation targets to guide the learning process, ignoring imputation consistency and ultimately resulting in poor performance. Diffusion models, known for their powerful generative abilities, prefer to generate consistent results based on available observations. Therefore, we propose a conditional diffusion model for Multivariate Time Series Consistent Imputation (MTSCI). Specifically, MTSCI employs a contrastive complementary mask to generate dual views during the forward noising process. Then, the intra contrastive loss is calculated to ensure intra-consistency between the imputed and observed values. Meanwhile, MTSCI utilizes a mixup mechanism to incorporate conditional information from adjacent windows during the denoising process, facilitating the inter-consistency between imputed samples. Extensive experiments on multiple real-world datasets demonstrate that our method achieves the state-of-the-art performance on multivariate time series imputation task under different missing scenarios. Code is available at https://github.com/JeremyChou28/MTSCI.
In recent years, graph convolution networks (GCNs) have been widely used in recommender systems due to high-order node information propagation and aggregation mechanisms. However, existing GCN-based recommender systems drop sharply in performance as the depth of the network increases. This phenomenon is called over-smoothing, which refers to the fact that the embeddings of all nodes become more similar and indistinguishable. Previous works have rarely explored over-smoothing from characteristics of the recommendation field. Specifically, we found experimentally that too many layers can lead to such large loss values that they are difficult to decrease. After theoretical analysis, we can effectively solve the problem of difficulty in decreasing the loss value by adding only a hyperparameter, called "power". This hyperparameter can effectively control the smoothness and alleviate the over-smoothing problem. Experiments on four public datasets demonstrate that this hyperparameter can effectively improve performance.
Graph anomaly detection (GAD) aims to find network elements (e.g., nodes, edges) with significantly atypical patterns and has a profound impact in a variety of application domains, including social network analysis, security, Web, finance, and many more. Most of the existing methods have been developed in an unsupervised manner or with extremely limited supervision, due to the high cost of acquiring ground-truth information. Consequently, the identified anomalies may turn out to be noises or uneventful instances because of the lack of prior knowledge on graph anomalies. To address the data scarcity issue in GAD, in this paper, we propose, gADAM, a novel graph neural network-based GAD framework, which consolidates (1) an innovative mixup approach to augment the original training data by adaptively interpolating data instances in the embedding space, and (2) an efficacious sampling method to obtain high-quality negative samples for model training. Additionally, to advance the representation learning for GAD, we further equip the proposed framework with a generic prototype-based learning module. Through extensive empirical evaluations, we corroborate the superiority of the proposed gADAM framework on graph anomaly detection w.r.t. various metrics.
Time series forecasting (TSF) consists of point prediction and probabilistic forecasting. Unlike point forecasting which predicts an expected value of a future target, probabilistic time series forecasting models the uncertainty in data by predicting the distribution of future values, which enhances decision-making flexibility and improves risk management. Traditional probabilistic forecasting methods usually assume a fixed distribution of data, which is not always true for time series. Recently, there have been efforts to adapt diffusion models for time series owing to their exceptional ability to model the distribution of data without prior assumptions. However, how to apply advantages of diffusion models to time series forecasting remains a substantial challenge due to specific issues in time series such as distribution drift and complex dynamic temporal patterns.
In this paper, we focus on the adaptation of diffusion models for time series forecasting. We propose REDI, a recurrent diffusion model that achieves effective probabilistic time series prediction with recurrent forward diffusion process and step-aware guidance in backward denoising process. The recurrent forward diffusion process enables the model to pay more attention to the impact of recent history on future values during the diffusion process, while the step-aware guidance facilitates precise guidance based on historical information during the denoising process. We conduct experiments on 5 real-world datasets and achieve average rankings of 1.8 for deterministic metrics and 1.5 for probabilistic metrics across 12 baselines, which strongly demonstrates the effectiveness of REDI.
Deep models for Multivariate Time Series (MTS) forecasting have recently demonstrated significant success. Channel-dependent models capture complex dependencies that channel-independent models cannot capture. However, the number of channels in real-world applications outpaces the capabilities of existing channel-dependent models, and contrary to common expectations, some models underperform the channel-independent models in handling high-dimensional data, which raises questions about the performance of channel-dependent models. To address this, our study first investigates the reasons behind the suboptimal performance of these channel-dependent models on high-dimensional MTS data. Our analysis reveals that two primary issues lie in the introduced noise from unrelated series that increases the difficulty of capturing the crucial inter-channel dependencies, and challenges in training strategies due to high-dimensional data. To address these issues, we propose STHD, the Scalable Transformer for High-Dimensional Multivariate Time Series Forecasting. STHD has three components: a) Relation Matrix Sparsity that limits the noise introduced and alleviates the memory issue; b) ReIndex applied as a training strategy to enable a more flexible batch size setting and increase the diversity of training data; and c) Transformer that handles 2-D inputs and captures channel dependencies. These components jointly enable STHD to manage the high-dimensional MTS while maintaining computational feasibility. Furthermore, experimental results show STHD's considerable improvement on three high-dimensional datasets: Crime-Chicago, Wiki-People, and Traffic. The source code and dataset are publicly available https://github.com/xinzzzhou/ScalableTransformer4HighDimensionMTSF.git.
Recently, patch-based transformer methods have demonstrated strong effectiveness in time series forecasting. However, the complexity of self-attention imposes demands on memory and compute resources. In addition, though patches can capture comprehensive temporal information while preserving locality, temporal information within patches remains important for time series prediction. The existing methods mainly focus on modeling long-term dependencies across patches, while paying little attention to the short-term dependencies within patches. In this paper, we propose the Global and Local Frequency-domain Network (GLFNet), a novel architecture that efficiently learns global time dependencies and local time relationships in the frequency domain. Specifically, we design a frequency filtering layer to learn the temporal interactions instead of self-attention. Then we devise a dual filtering block consisting of global filter block and local filter block which learns the global dependencies across patches and local dependencies within patches. Experiments on seven benchmark datasets demonstrate that our approach achieve superior performance with improved efficiency.
Submodular optimization finds applications in machine learning and data mining. In this paper, we study the problem of maximizing functions of the form h = f-c, where f is a monotone, non-negative, weakly submodular set function and c is a modular function. We design a deterministic approximation algorithm that runs with O(n/ε log n/(γ ε) ) oracle calls to function h, and outputs a set S such that h(S) ≥ γ(1-ε)f(OPT)-c(OPT)-c(OPT)/γ(1-ε) log f(OPT)/c(OPT), where γ is the submodularity ratio of f. Existing algorithms for this problem either admit a worse approximation ratio or have quadratic runtime. We also present an approximation ratio of our algorithm for this problem with an approximate oracle of f. We validate our theoretical results through extensive empirical evaluations on real-world applications, including vertex cover and influence diffusion problems for submodular utility function f, and Bayesian A-Optimal design for weakly submodular f. Our experimental results demonstrate that our algorithms efficiently achieve high-quality solutions.
The integration of multimodal Electronic Health Records (EHR) data has significantly advanced clinical predictive capabilities. Existing models, which utilize clinical notes and multivariate time-series EHR data, often fall short of incorporating the necessary medical context for accurate clinical tasks, while previous approaches with knowledge graphs (KGs) primarily focus on structured knowledge extraction. In response, we propose EMERGE, a Retrieval-Augmented Generation (RAG) driven framework to enhance multimodal EHR predictive modeling. We extract entities from both time-series data and clinical notes by prompting Large Language Models (LLMs) and align them with professional PrimeKG, ensuring consistency. In addition to triplet relationships, we incorporate entities' definitions and descriptions for richer semantics. The extracted knowledge is then used to generate task-relevant summaries of patients' health statuses. Finally, we fuse the summary with other modalities using an adaptive multimodal fusion network with cross-attention. Extensive experiments on the MIMIC-III and MIMIC-IV datasets' in-hospital mortality and 30-day readmission tasks demonstrate the superior performance of the EMERGE framework over baseline models. Comprehensive ablation studies and analysis highlight the efficacy of each designed module and robustness to data sparsity. EMERGE contributes to refining the utilization of multimodal EHR data in healthcare, bridging the gap with nuanced medical contexts essential for informed clinical predictions. We have publicly released the code at https://github.com/yhzhu99/EMERGE.
Electronic Health Records (EHRs) provide valuable patient data but often suffer from sparsity issue, posing significant challenges in predictive modeling. Conventional imputation methods inadequately distinguish between real and imputed data, leading to potential inaccuracies of patient representations. To address these issues, we introduce PRISM, a framework that indirectly imputes data through prototype representations of similar patients, thus ensuring denser and more accurate embeddings. PRISM also includes a feature confidence learner module, which evaluates the reliability of each feature considering missing statuses. Additionally, it incorporates a new patient similarity metric that accounts for feature confidence, avoiding overreliance on imprecise imputed values. Our extensive experiments on the MIMIC-III, MIMIC-IV, PhysioNet Challenge 2012, eICU datasets demonstrate PRISM's superior performance in predicting in-hospital mortality and 30-day readmission tasks, showcasing its effectiveness in handling EHR data sparsity. For the sake of reproducibility and further research, we have publicly released the code at https://github.com/yhzhu99/PRISM.
Recent joint models for multi-intent detection and slot filling (a.k.a multi-intent SLU) have obtained promising results by leveraging the semantic similarities or co-occurrence relationships between intent and slot labels. However, a critical aspect frequently neglected by current models is the significant correlations between label co-occurrences and specific scenarios, such as watching a movie or booking a ticket, which is essential for understanding user utterances in multi-intent SLU. In this paper, we propose a new framework dubbed SALA (short for Scenario- a ware Label gr a ph interaction), which effectively captures the dynamic co-occurrence relationships among labels across various scenarios, employing a strategy akin to a divide-and-conquer approach. Concretely, SALA first autonomously classifies the scenario of utterances, and tracks the co-occurring labels by maintaining a unique co-occurrence matrix for each scenario during the training phase. These scenario-independent co-occurrence matrices are further employed to guide the interactions among label representations through graph propagation to conduct accurate prediction. Extensive experiments on two multi-intent SLU benchmark datasets demonstrate the superiority of our SALA. More strikingly, SALA also attains competitive results on four extra single-intent and multi-domain SLU benchmark datasets, demonstrating its strong generalizability.
Cross-lingual event detection (CLED) is a challenging information extraction task in which a model is trained in one language and evaluated in another. Most recent methods attack CLED by aligning source and target language representations based on fine-tuning multilingual pre-trained language models. However, they need to modify all the model parameters and store a complete copy for each source-target language pair, which is resource-intensive and requires significant memory. In contrast, prefix-tuning is a more lightweight alternative, but it relies solely on the labeled source language data during training, limiting its performance. To address the above problems, we propose a novel framework for CLED with Language-agnostic Prototypical Prefix-Learning (L-APPLE), which can integrate language-agnostic event information with prefix-tuning. In detail, inspired by vanilla prompt methods, L-APPLE divides the prefix into two parts: one optimized as continuous word embeddings while the other generated with cross-lingual aligned event prototypes. Meanwhile, we employ language alignment with contrastive learning to acquire cross-lingual aligned event prototypes, and finally, parameters are optimized using both task and alignment loss. The evaluation of public CLED benchmarks demonstrates that L-APPLE achieves significant improvements in CLED with only less than 0.1% of the parameters optimized compared to previous fine-tuning methods.
Contrastive learning has been extensively studied in sentence representation learning as it demonstrates effectiveness in various downstream applications, where the same sentence with different dropout masks (or other augmentation methods) is considered as positive pair while taking other sentences in the same mini-batch as negative pairs. However, these methods mostly treat all negative examples equally and overlook the different similarities between the negative examples and the anchors, which thus fail to capture the fine-grained semantic information of the sentences. To address this issue, we explicitly differentiate the negative examples by their similarities with the anchor, and thus propose a simple yet effective method SoftCSE that individualizes either the weight or temperature of each negative pair in the standard InfoNCE loss according to the similarities of the negative examples and the anchors. We further provide the theoretical analysis of our methods to show why and how SoftCSE works, including the optimal solution, gradient analysis and the connection with other loss. Empirically, we conduct extensive experiments on semantic textual similarity (STS) and transfer (TR) tasks, as well as text retrieval and reranking, where we observe significant performance improvements compared to strong baseline models.
Understanding emotions in dialogue is an essential part of human communication, its an extremely complex cognitive process involving cross-modal interactions and cross-emotional associations, and multimodal sarcasm detection is an emerging but challenging research task in this process aiming at video discourse incorporating appropriate contextual information and external knowledge and identifying sarcasm by understanding both verbal and non-verbal components. However, existing research primarily focuses on constructing multimodal fusion representations and capturing incongruity between modalities as indicative cues for recognizing sarcasm, which relies on a fixed network design architecture that is difficult to cope with complex and diverse satirical scenarios in real life. As humans, we rely on the combination of visual and auditory cues, such as facial expressions and intonations, to understand information. Our brains are implicitly trained to integrate information from multiple senses to form a comprehensive understanding of conveyed messages, a process known as multi-sensory integration. The combination of different modalities not only provides additional information but also amplifies the information conveyed by each modality relative to others. Therefore, dynamic variations in the weights of different modalities play a crucial role in multi-modal understanding. From this perspective, we propose a new framework called Multi-view BART(MV-BART), which is capable of exploiting multi-granularity cues from multiple viewpoints and dynamically adjusting the view weights, applied to different sarcastic scenarios. It is worth mentioning that we analyze the proposed framework by testing it on several benchmark datasets, and the results outperform the existing state-of-the-art.
Event Detection (ED), a crucial component of comprehensive text analysis tools, is a well-established task within the fields of Natural Language Processing (NLP) and Information Extraction (IE). Current state-of-the-art models for ED primarily focus on identifying a limited set of predefined event types. Recently, the challenge of detecting a broad array of predefined event types has garnered increasing interest within the IE community. However, a significant gap in existing research on ED with extensive ontologies is the inadequate exploration of how interactions between event types affect ED model performance. One of the hindrances for this purpose is the lack of resources to encode event-event dependencies for large ontologies. This study introduces a novel approach that leverages existing inter-event dependency resources to provide this information for extensive ontologies. Specifically, a solution based on Optimal Transport is proposed to map event-event dependency from existing resources to a large ontology. We conduct extensive experiments on multiple benchmark datasets to assess the effectiveness of our approach. Our findings, supported by a thorough analysis, demonstrate that this innovative technique significantly enhances the performance of ED models, especially for ontologies with a large number of event types.
Several security and workflow applications require provenance information at the operating system level for diagnostics. The resulting provenance traces are often more informative if they are efficiently mapped to execution paths within the control flow graph. However, current provenance systems do not map traces to control flow graphs for diagnostics purposes due to the computational complexity of mapping traces to graphs. We formulate the path prediction problem for provenance traces and take a machine learning approach to solve the problem. We develop a transformer-based graph convolutional network to predict paths. Our experiments demonstrate that our machine learning model achieves more than twice the accuracy on average compared to simple probabilistic models, with an increased computation time trade-off.
Multivariate time series classification is an important task with widespread domains of applications. Recently, deep neural networks (DNN) have achieved state-of-the-art performance in time series classification. However, they often require large expert-labeled training datasets which can be infeasible in practice. In few-shot settings, i.e. only a limited number of samples per class are available in training data, DNNs show a significant drop in testing accuracy and poor generalization ability. In this paper, we propose to address these problems from an optimization and a loss function perspective. Specifically, we propose a new learning framework named COSCO consisting of a sharpness-aware minimization (SAM) optimization and a Prototypical loss function to improve the generalization ability of DNN for multivariate time series classification problems under few-shot setting. Our experiments demonstrate our proposed method outperforms the existing baseline methods. Our source code is available at: https://github.com/JRB9/COSCO.
We consider the fractional influence maximization problem, i.e., identifying users on a social network to be incentivized with potentially partial discounts to maximize the influence on the network. The larger the discount given to a user, the higher the likelihood of its activation (adopting a new product or innovation), who then attempts to activate its neighboring users, causing a cascade effect of influence through the network. Our goal is to devise efficient algorithms that assign initial discounts to the network's users to maximize the total number of activated users at the end of the cascade, subject to a constraint on the total sum of discounts given. In general, the activation likelihood could be any non-decreasing function of the discount, whereas, our focus lies on the case when the activation likelihood is an affine function of the discount, potentially varying across different users. As this problem is shown to be NP-hard, we propose and analyze an efficient (1-1/e)-approximation algorithm. Furthermore, we run experiments on real-world social networks to show the performance and scalability of our method.
Today we are able to generate a large set of text representations from the simple Bag-of-word (BOW) to the recent transformers capturing the semantic and the contextual text meaning. It was proven that there is no best text representation for text clustering task. Consequently, some works combined text representations using a consensus clustering approach. Two consensus approach types exist, namely explicit and implicit consensus. In the explicit consensus, also known asensemble clustering, the consensus function is applied a posterior after obtaining cluster labels from each text representation clustering allowing to capture global mutual information between the partitions of all text representations. On the other hand, implicit consensus uses tensor clustering to optimize the clustering consensus partition that deals with similarity matrices of text representations.
In this paper, we propose a new consensus text clustering algorithm named IEcons (Implicit-Explicit consensus) that optimizes explicit and implicit consensus clustering simultaneously through text embeddings and tensor representation of texts through similarity matrices. We compare our algorithm with others from the literature on five different textual datasets using several algorithm performance criteria. The comparison results reveal that our algorithm best suits most situations.
Leveraging current legal standards, we define bias through the lens of marginal benefits and objective testing with the novel metric "Objective Fairness Index". This index combines the contextual nuances of objective testing with metric stability, providing a legally consistent and reliable measure. Utilizing the Objective Fairness Index, we provide fresh insights into sensitive machine learning applications, such as COMPAS (recidivism prediction), highlighting the metric's practical and theoretical significance. The Objective Fairness Index allows one to differentiate between discriminatory tests and systemic disparities.
Learned sparse representations form an effective and interpretable class of embeddings for text retrieval. While exact top-k retrieval over such embeddings faces efficiency challenges, a recent algorithm called Seismic has enabled remarkably fast, highly-accurate approximate retrieval. Seismic statically prunes inverted lists, organizes each list into geometrically-cohesive blocks, and augments each block with a summary vector. At query time, each inverted list associated with a query term is traversed one block at a time in an arbitrary order, with the inner product between the query and summaries determining if a block must be evaluated. When a block is deemed promising, its documents are fully evaluated with a forward index. Seismic is one to two orders of magnitude faster than state-of-the-art inverted index-based solutions and significantly outperforms the winning graph-based submissions to the BigANN 2023 Challenge. In this work, we speed up Seismic further by introducing two innovations to its query processing subroutine. First, we traverse blocks in order of importance, rather than arbitrarily. Second, we take the list of documents retrieved by Seismic and expand it to include the neighbors of each document using an offline k-regular nearest neighbor graph; the expanded list is then ranked to produce the final top-k set. Experiments on two public datasets show that our extension, named SeismicWave, can reach almost-exact accuracy levels and is up to 2.2x faster than Seismic.
Learned dense representations are a popular family of techniques for encoding queries and documents using high-dimensional embeddings, which enable retrieval by performing approximate k nearest-neighbors search (A-kNN). A popular technique for making A-kNN search efficient is based on a two-level index, where the embeddings of documents are clustered offline and, at query processing, a fixed number N of clusters closest to the query is visited exhaustively to compute the result set.
In this paper, we build upon state-of-the-art for early exit A-kNN and propose an unsupervised method based on the notion of patience, which can reach competitive effectiveness with large efficiency gains. Moreover, we discuss a cascade approach where we first identify queries that find their nearest neighbor within the closest τ << N clusters, and then we decide how many more to visit based on our patience approach or other state-of-the-art strategies. Reproducible experiments employing state-of-the-art dense retrieval models and publicly available resources show that our techniques improve the A-kNN efficiency with up to 5× speedups while achieving negligible effectiveness losses. All the code used is available at https://github.com/francescobusolin/faiss_pEE
Sketch algorithms are crucial for identifying top-k items in large-scale data streams. Existing methods often compromise between performance and accuracy, unable to efficiently handle increasing data volumes with limited memory. We present Bubble Sketch, a compact algorithm that excels in both performance and accuracy. Bubble Sketch achieves this by (1) Recording only full keys of hot items, significantly reducing memory usage, and (2) Using threshold relocation to resolve conflicts, enhancing detection accuracy. Unlike traditional methods, Bubble Sketch eliminates the need for a Min-Heap, ensuring fast processing speeds. Experiments show Bubble Sketch outperforms the other seven algorithms compared, with the highest throughput and precision, and surpasses HeavyKeeper in accuracy by up to two orders of magnitude.
Unsupervised feature selection (UFS) methods have garnered significant attention for their capability to eliminate redundant features without relying on class label information. However, their scalability to large datasets remains a challenge, rendering common UFS methods impractical for such applications. To address this issue, we introduce QMR-FS, a greedy forward filtering approach that selects linearly independent features up to a specified relative tolerance, ensuring that any excluded features can be reconstructed from the retained set within this tolerance. This is achieved through the QMR matrix decomposition, which builds upon the well-known QR decomposition. QMR-FS benefits from linear complexity relative to the number of instances and boasts exceptional performance due to its ability to leverage parallelized computation on both CPU and GPU. Despite its greedy nature, QMR-FS achieves comparable classification and clustering accuracies across multiple datasets when compared to other UFS methods, while achieving runtimes approximately 10 times faster than recently proposed scalable UFS methods for datasets ranging from 100 million to 1 billion elements.
In this work we investigate the capability of Graph Attention Network for extracting aspect and opinion terms. Aspect and opinion term extraction is posed as a token-level classification task akin to named entity recognition. We use the dependency tree of the input query as additional feature in a Graph Attention Network along with the token and part-of-speech features. We show that the dependency structure is a powerful feature that in the presence of a CRF layer substantially improves the performance and generates the best result on the commonly used datasets from SemEval 2014, 2015 and 2016. We experiment with additional layers like BiLSTM and Transformer in addition to the CRF layer. We also show that our approach works well in the presence of multiple aspects or sentiments in the same query and it is not necessary to modify the dependency tree based on a single aspect as was the original application for sentiment classification.
This paper addresses the common challenge of system performance degradation due to speech inconsistency and mismatched acoustic conditions across various domains in speaker verification tasks. We propose a Noise-Aware Quality Network designed to estimate a score based on speech quality and the presence of speech obscured by noise in real-world environments. The score, derived from the normalization of estimated speech quality evaluations, is incorporated into a proposed Noise-Aware Quality loss function, aiming to prioritize speech quality by weighting the embedding distances based on the quality score. Our methodology significantly improves speaker verification performance, particularly in noisy environments. Furthermore, our work highlights the importance of speech quality and the potential benefits of incorporating speech quality weight into the loss function for speaker verification tasks.
Opinion mining, specifically in the investment sector, has experienced a significant increase in interest over recent years. This paper presents a novel approach to overcome current limitations in assessing and ranking investor opinions based on profitability. The study introduces a pre-finetuning scheme to improve language models' capacity to distinguish professionalism, thus enabling ranking of all available opinions. Furthermore, the paper evaluates ranking results using traditional metrics and suggests the use of a pairwise setting for better performances over a regression setting. Lastly, our method is shown to be effective across various investor opinion tasks, encompassing both professional and amateur investors. The results indicate that this approach significantly enhances the efficiency and accuracy of opinion mining in the investment sector.
Document layout generation, a burgeoning field of document intelligence, entails positioning and sizing various elements within given constraints. While significant strides have been made in single-page layout generation, real-world documents predominantly span multiple pages, and exploring multi-page layout generation methods has also become the key to meeting the contemporary dramatically increased document processing demands. Despite the promise of leveraging large language models (LLMs) like GPT-4 for their powerful in-context learning abilities, the task transition to multi-page layouts, which contains considerably complex data, presents formidable challenges including excessively long prompts and strict consistency between pages. To this end, we propose a novel framework called Multi-Page Layout Generation via Consistency-Oriented modeling (MuLCO) that capitalizes on in-context learning of LLMs without the need for training or fine-tuning. MuLCO employs three key components: serialization based on code blocks maps intricate document layouts to code-style exemplars, self-correcting reasoning hint decomposes the complex generation task into numerous steps to improve reasoning interpretability, and consistency-oriented multi-round generation predicts coherent multi-page layouts in form of a continuous dialogue. To summarize, we contribute by proposing MuLCO and developing a task-specific dataset and evaluation mechanism. Extensive experiments validate the effectiveness of the MuLCO framework for multi-page layout generation.
Existing news recommendation systems often overlook the diversity of recommended content and exhibit popularity bias, resulting in suboptimal performance. To address this issue, this paper introduces a novel news recommendation approach, Popularity- and Position-Aware Contrastive Learning for Retrieval-Driven News Recommendation (PP4RNR). It consists of two modules: Entity-Level Retrieval Augmentation (ERA) and Popularity- and Position-Aware Contrastive Learning (PPCL). The ERA module utilizes both entities and titles to retrieve relevant news. Subsequently, retrieval-augmented news is fused with candidate news using our innovative cascaded attention network, leading to richer and more diverse news semantics. The PPCL module introduces perturbations in the news representation using a Gaussian perturbation vector based on the popularity and position information and then employs contrastive learning to regularize the representation space. Hence, this approach not only deepens the understanding of content diversity but also implicitly mitigates the popularity bias prevalent in current models. Rigorous testing on benchmark datasets demonstrates that our method significantly outperforms a range of state-of-the-art techniques.
Dataset Distillation (DD) is a technique for synthesizing smaller, compressed datasets from large original datasets while retaining essential information to maintain efficacy. Efficient DD is a current research focus among scholars. Squeeze, Recover and Relabel (SRe2L) and Adversarial Prediction Matching (APM) are two advanced and efficient DD methods, yet their performance is moderate with lower volumes of distilled data. This paper proposes an ingenious improvement method, Distributed Boosting (DB), capable of significantly enhancing the performance of these two algorithms at low distillation volumes, leading to DB-SRe2L and DB-APM. Specifically, DB is divided into three stages: Distribute & Encapsulate, Distill, and Integrate & Mix-relabel. DB-SRe2L, compared to SRe2L, demonstrates performance improvements of 25.2%, 26.9%, and 26.2% on full 224×224 ImageNet-1k at Images Per Class (IPC) 10, CIFAR-10 at IPC 10, and CIFAR-10 at IPC 50, respectively. Meanwhile, DB-APM, in comparison to APM, exhibits performance enhancements of 21.2% and 20.9% on CIFAR-10 at IPC 10, CIFAR-100 at IPC 1, respectively. Additionally, we provide a theoretical proof of convergence for DB. To the best of our knowledge, DB is the first method suitable for distributed parallel computing scenarios.
Traffic allocation is a process of redistributing natural traffic to products by adjusting their positions in the post-search phase, aimed at effectively fostering merchant growth, precisely meeting customer demands, and ensuring the maximization of interests across various parties within e-commerce platforms. Existing methods based on learning to rank neglect the long-term value of traffic allocation, whereas approaches of reinforcement learning suffer from balancing multiple objectives and the difficulties of cold starts within real-world data environments. To address the aforementioned issues, this paper propose a multi-objective deep reinforcement learning framework consisting of multi-objective Q-learning (MOQ), a decision fusion algorithm (DFM) based on the cross-entropy method(CEM), and a progressive data augmentation system (PDA). Specifically. MOQ constructs ensemble RL models, each dedicated to an objective, such as click-through rate, conversion rate, etc. These models individually determine the position of items as actions, aiming to estimate the long-term value of multiple objectives from an individual perspective. Then we employ DFM to dynamically adjust weights among objectives to maximize long-term value, addressing temporal dynamics in objective preferences in e-commerce scenarios. Initially, PDA trained MOQ with simulated data from offline logs. As experiments progressed, it strategically integrated real user interaction data, ultimately replacing the simulated dataset to alleviate distributional shifts and the cold start problem. Experimental results on real-world online e-commerce systems demonstrate the significant improvements of MODRL-TA, and we have successfully deployed MODRL-TA on an e-commerce search platform.
The stock trend prediction problem refers to forecasting future stock price trends. In recent years, some methods discovered causal relations between stocks to address this problem. However, traditional causal discovery methods face unique challenges in the stock market, as they fail to uncover accurate causal relationships when a distribution shift happens in stock. Additionally, current methods also overlook the commonalities and differences between stock relations. To address these shortcomings, we propose a causal-enhanced multi-view temporal graph model, named CMG. This method explores comprehensive causal relations by incorporating distribution shift confounder and constructs a multi-view contrastive learning module to unearth the commonalities and differences between stock relations, thereby enabling more accurate stock trend predictions. Further experimental results and investment simulations demonstrate the effectiveness and profitability of CMG.
A choice of optimization objective is immensely pivotal in the design of a recommender system as it affects the general modeling process of a user's intent from previous interactions. Existing approaches mainly adhere to three categories of loss functions: pairwise, pointwise, and setwise loss functions. Despite their effectiveness, a critical and common drawback of such objectives is viewing the next observed item as a unique positive while considering all remaining items equally negative. Such a binary label assignment is generally limited to assuring a higher recommendation score of the positive item, neglecting potential structures induced by varying preferences between other unobserved items. To alleviate this issue, we propose a novel method that extends original objectives to explicitly leverage the different levels of preferences as relative orders between their scores. Finally, we demonstrate the superior performance of our method compared to baseline objectives.
Automatic Chart Question Answering (ChartQA) is challenging due to the complex distribution of chart elements with patterns of the underlying data not explicitly displayed in charts. To address this challenge, we design a joint multimodal scene graph for charts to explicitly represent the relationships between chart elements and their patterns. Our proposed multimodal scene graph includes a visual graph and a textual graph to jointly capture the structural and semantical knowledge from the chart. This graph module can be easily integrated with different vision transformers as inductive bias. Our experiments demonstrate that incorporating the proposed graph module enhances the understanding of charts' elements' structure and semantics, thereby improving performance on publicly available benchmarks, ChartQA and OpenCQA.
Regression models are of fundamental importance in explicitly explaining the response variable in terms of covariates. However, point predictions of these models limit them from many real world applications. Heteroscedasticity is common in most real-world scenarios and is hard to model due to its randomness. The Gaussian process generally captures epistemic (model) uncertainty but fails to capture heteroscedastic aleatoric uncertainty. The framework of HetGP inherently captures both epistemic and aleatoric by placing independent GP's priors on both mean function and error term. We propose the posthoc HetGP on the residuals of the trained deterministic neural network to obtain both epistemic and aleatoric uncertainty. The advantage of posthoc HetGP on residuals is that it can be extended to any type of model, since the model is assumed to be black-box that gives point predictions. We demonstrate our approach through simulation studies and UCI regression datasets. The code is available at https://visdomlab.github.io/HetGP/
Self-supervised Pretrained Models (PTMs) have demonstrated remarkable performance in computer vision and natural language processing tasks. These successes have prompted researchers to design PTMs for time series data. In our experiments, most self-supervised time series PTMs were surpassed by simple supervised models. We hypothesize this undesired phenomenon may be caused by data scarcity. Our results indicate that replacing a real-data pretraining set with a greater volume of only generated samples produces noticeable improvement.
The field of autonomous vehicles (AVs) predominantly leverages multi-modal integration of LiDAR and camera data to achieve better performance compared to using a single modality. However, the fusion process encounters challenges in detecting distant objects due to the disparity between the high resolution of cameras and the sparse data from LiDAR. Insufficient integration of global perspectives with local-level details results in sub-optimal fusion performance.To address this issue, we have developed an innovative two-stage fusion process called Quantum Inverse Contextual Vision Transformers (Q-ICVT). This approach leverages adiabatic computing in quantum concepts to create a novel reversible vision transformer known as the Global Adiabatic Transformer (GAT). GAT aggregates sparse LiDAR features with semantic features in dense images for cross-modal integration in a global form. Additionally, the Sparse Expert of Local Fusion (SELF) module maps the sparse LiDAR 3D proposals and encodes position information of the raw point cloud onto the dense camera feature space using a gating point fusion approach. Our experiments show that Q-ICVT achieves an mAPH of 82.54 for L2 difficulties on the Waymo dataset, improving by 1.88% over current state-of-the-art fusion methods. We also analyze GAT and SELF in ablation studies to highlight the impact of Q-ICVT. Our code is available at https://github.com/sanjay-810/Qicvt
With the introduction of large language models (LLMs), automatic math reasoning has seen tremendous success. However, current methods primarily focus on providing solutions or using techniques like Chain-of-Thought to enhance problem-solving accuracy. In this paper, we focus on improving the capability of mathematics teaching via a Socratic teaching-based LLM (SocraticLLM), which guides learners toward profound thinking with clarity and self-discovery via conversation. We collect and release a high-quality mathematical teaching dataset, named SocraticMATH, which provides Socratic-style conversations of problems with extra knowledge. Also, we propose a knowledge-enhanced LLM as a strong baseline to generate reliable responses with review, guidance/heuristic, rectification, and summarization. Experimental results show the great advantages of SocraticLLM by comparing it with several strong generative models. The codes and datasets are available on https://github.com/ECNU-ICALK/SocraticMath.
We investigate Graph Neural Networks (GNNs) on heterophilous graphs for node classification. To address the scarcity of useful local information in heterophilous neighborhood, it is often essential to explore global interactions. However, many existing methods in this endeavor are computationally expensive and may suffer from issues like oversquashing. In addition, earlier studies show that GNNs can be outperformed by Multi-Layer Perceptrons on heterophilous graphs, indicating insufficient exploitation of node feature information. To address these limitations, we propose Prototype Mediated GNN (PM-GNN), a novel framework which efficiently captures global feature information using class prototypes. PM-GNN learns multiple class prototypes for each class from raw node features with a soft k-means clustering mechanism. These prototypes are then transferred onto node embeddings via explicit message passing, bypassing local neighborhoods and mitigating oversquashing. PM-GNN can scale to large graphs, outperforming strong baselines on multiple heterophilous datasets.
This paper investigates the factuality of large language models (LLMs) as knowledge bases in the legal domain, in a realistic usage scenario: we allow for acceptable variations in the answer, and let the model abstain from answering when uncertain. First, we design a dataset of diverse factual questions about case law and legislation. We then use the dataset to evaluate several LLMs under different evaluation methods, including exact, alias, and fuzzy matching. Our results show that the performance improves significantly under the alias and fuzzy matching methods. Further, we explore the impact of abstaining and in-context examples, finding that both strategies enhance precision. Finally, we demonstrate that additional pre-training on legal documents, as seen with SaulLM, further improves factual precision from 63% to 81%.
Many tangible and intangible objects are represented as itemsets; i.e., composition of individual items. In this paper, we address the problem of finding the embedding of such items so as to use those embeddings in tasks like missing item prediction. We approach this problem by means of determinantal point process (DPP) in order to reflect the diversity within each set. Doing so requires an optimization of a log determinant of a symmetric positive definite (SPD) matrix. The standard practice to achieve this is to perform a low-rank decomposition of the matrix and derive update rules for the low rank matrix. In this work, we propose to approach this problem by means of item embedding. That is, we will learn the SPD matrix by trying to find the right vector representations for the given data for a fixed kernel function. To this end, we propose a novel algorithm to accurately compute the gradients of the log determinant with respect to the embedding vectors. We also show that our approach outperforms Autodiff-based learning in terms of gradient direction and running time, and that other general log determinant optimization problems can be addressed.
Graph Neural Networks (GNNs) have achieved remarkable success across various domains, yet recent studies have exposed their vulnerability to backdoor attacks. Backdoor attacks inject triggers into the training set to poison the model, with adversaries typically relabeling training samples with backdoor triggers to a target label. This leads a GNN trained on the poisoned dataset to misclassify any test sample containing the backdoor trigger as the target label. However, relabeling not only increases the cost of the attack but also raises the risk of detection. Therefore, our study focuses on clean-label backdoor attacks, which do not require modify the labels of trigger-attached samples in the training phase. Specifically, we employ a novel method to select effective poisoned samples belonging to the target class. An adaptive trigger generator is furthest deployed to high attack success rates under a small backdoor budget. Our experiments on four public datasets validate the effectiveness of our proposed attack.
We present General Time Transformer (GTT), an encoder-only style foundation model for zero-shot multivariate time series forecasting. GTT is pretrained on a large dataset of 200M high-quality time series samples spanning diverse domains. In our framework, we consider multivariate time series as a distinct category of images characterized by varying number of channels, and represent each time series sample as a sequence of non-overlapping curve shapes (patches) within an unified numerical magnitude. Furthermore, we formulate the task of multivariate time series forecasting as a problem of predicting the next curve shape based on a window of past curve shapes on a channel-wise basis. Experimental results demonstrate that GTT exhibits superior zero-shot multivariate forecasting capabilities on unseen time series datasets, even surpassing state-of-the-art supervised baselines. Additionally, we investigate the impact of varying GTT model parameters and training dataset scales, observing that the scaling law also applies in the context of zero-shot multivariate time series forecasting. The codebase of GTT is available at https://github.com/cfeng783/GTT.
Changes made to webpages can affect their retrievability. Often this is done with the intention of increasing the page's search engine ranking to improve overall access to information on the page. The Environmental Data and Governance Initiative (EDGI) created a dataset that describes changes on US federal environmental webpages between 2016 and 2020. EDGI noted that many environmental terms were deleted from the pages, but without user data, claims that page retrievability and public information access were lowered are only anecdotal. The Open Resource for Click Analysis in Search (ORCAS) dataset was created during the same time frame, from 2017 to 2020, and enables high quality user intent analysis without compromising on user privacy protection. We present an analysis of the intersection of the EDGI dataset and the ORCAS dataset, matching changes on federal environmental webpages with their associated queries. We use web archives and a change-text indexing system to link changes in term frequency on the pages with the queries. We find that the pages contain fewer query terms in 2020 than in 2016, lowering the pages' retrievability. The analysis provides substantive support of EDGI's claim that federal environmental pages were made less accessible between 2016 and 2020.
Recent studies have found that many VQA models are influenced by biases, preventing them from effectively using multimodal information for reasoning. Consequently, these methods, which perform well on standard VQA datasets, exhibit underwhelming performance on the bias-sensitive VQA-CP dataset. Although numerous studies in the past have focused on mitigating biases in VQA models, most have only considered language bias. In this paper, we address the issue of bias in VQA task by targeting the various sources of bias. Specifically, to counteract shortcut biases, we integrate a bias detector capable of capturing both vision and language biases, and we reinforce its ability to capture biases using a generative adversarial network and knowledge distillation. To combat distribution bias, we use a cosine classifier to obtain a cosine feature branch from the base model, training it with an adaptive angular margin loss based on answer frequency and difficulty, along with a supervised contrastive loss to enhance the model's classification ability in the feature space. In the prediction stage, we fuse the cosine features with the prediction of the base model to obtain the final prediction of our model. Finally, extensive experiments demonstrate that our approach SD-VQA achieves state-of-the-art performance on the VQA-CPv2 dataset without using any data balancing, and achieves competitive results on the VQAv2 dataset.
Scalability is a major challenge in modern recommender systems. In sequential recommendations, full Cross-Entropy (CE) loss achieves state-of-the-art recommendation quality but consumes excessive GPU memory with large item catalogs, limiting its practicality. Using a GPU-efficient locality-sensitive hashing-like algorithm for approximating large tensor of logits, this paper introduces a novel RECE (REduced Cross-Entropy) loss. RECE significantly reduces memory consumption while allowing one to enjoy the state-of-the-art performance of full CE loss. Experimental results on various datasets show that RECE cuts training peak memory usage by up to 12 times compared to existing methods while retaining or exceeding performance metrics of CE loss. The approach also opens up new possibilities for large-scale applications in other domains.
In sequential recommendation, pre-training from user historical behaviors through self-supervised learning can better comprehend user dynamic preferences, presenting the potential for direct integration with Click-Through Rate (CTR) prediction tasks. Previous methods have integrated pre-trained models into downstream tasks with the sole purpose of extracting semantic information or well-represented user features, which are then incorporated as new features. However, these approaches tend to ignore the additional inference costs and do not consider how to transfer the effective information from the pre-trained models for specific estimated items in CTR prediction. In this paper, we propose a Sequential Recommendation Pre-training framework for CTR prediction (SRP4CTR) to tackle the above problems. Initially, we discuss the impact of introducing pre-trained models on inference costs. Subsequently, we introduced a pre-trained method to encode sequence side information concurrently. During the fine-tuning process, we incorporate a cross-attention block to establish a bridge between estimated items and the pre-trained model at a low cost. Moreover, we develop a querying transformer technique to facilitate the knowledge transfer from the pre-trained model. Offline and online experiments show that our method outperforms previous baseline models.
Bottleneck identification is a challenging task in network analysis, especially when the network is not fully specified. To address this task, we develop a unified online learning framework based on combinatorial semi-bandits that performs bottleneck identification alongside learning the specifications of the underlying network. Within this framework, we adapt and investigate several combinatorial semi-bandit methods such as epsilon-greedy, LinUCB, BayesUCB, and Thompson Sampling. Our framework is able to employ contextual information in the form of contextual bandits. We evaluate our framework on the real-world application of road networks and demonstrate its effectiveness in different settings.
Nonparametric estimation of information divergence functionals between two probability densities is an important problem in machine learning. Several estimators exist that guarantee the parametric rate of mean squared error (MSE) of O(1/N) under various assumptions on the smoothness and boundary of the underlying densities, with N being the number of samples. In particular, previous work on ensemble estimation theory derived ensemble estimators of divergence functionals that achieve the parametric rate without requiring knowledge of the densities' support set and are simple to implement. However, these and most other methods all assume some level of differentiability of the divergence functional. This excludes important divergence functionals such as the total variation distance and the Bayes error rate. Here, we show empirically that the ensemble estimation approach for smooth functionals can be applied to less smooth functionals and obtain good convergence rates, suggesting a gap in current theory.
Prior research has demonstrated that reformulation of queries can significantly enhance retrieval effectiveness. Despite notable successes in neural-based query reformulation methods, identifying optimal reformulations that cover the same information need while enhancing retrieval effectiveness is still challenging. This paper introduces a two-step query reformulation framework for generating and selecting optimal target query variants which not only achieve higher retrieval performance but also preserve the original query's information need. Our comprehensive evaluations on the MS MARCO dataset and TREC Deep Learning tracks demonstrate substantial improvements over original query's performance.
Chemical reaction data has existed and still largely exists in unstructured forms. But curating such information into datasets suitable for tasks such as yield and reaction outcome prediction is impractical via manual curation and not possible to automate through programmatic means alone. Large language models (LLMs) have emerged as potent tools, showcasing remarkable capabilities in processing textual information and therefore could be extremely useful in automating this process. To address the challenge of unstructured data, we manually curated a dataset of structured chemical reaction data to fine-tune and evaluate LLMs. We propose a paradigm that leverages prompt-tuning, fine-tuning techniques, and a verifier to check the extracted information. We evaluate the capabilities of various LLMs, including LLAMA-2 and GPT models with different parameter counts, on the data extraction task. Our results show that prompt tuning of GPT-4 yields the best accuracy and evaluation results. Fine-tuning LLAMA-2 models with hundreds of samples does enable them and organize scientific material according to user-defined schemas better though. This workflow shows an adaptable approach for chemical reaction data extraction but also highlights the challenges associated with nuance in chemical information. We open-sourced our code at https://github.com/joker-bruce/LLM_Extraction_Chem.
Graph-based fraud detection faces significant challenges, such as severe class imbalance, inconsistent connections due to the scarcity of fraudulent nodes, and the camouflage of these nodes appearing like benign nodes. Existing studies often adopt the approach of filtering similar nodes to strengthen the homophily assumption of graph neural networks. However, to effectively address these issues, it is important to distinguish and adaptively utilize the labels of neighboring nodes. In this study, we propose the Label-Exploring Graph Neural Network (LEX-GNN), designed to enhance fraud detection by actively leveraging labeled node information. The core idea is that the manner of message passing and reception should vary depending on the node types. Specifically, we first predict the labels of nodes based on their original or previous representations. Subsequently, each node transmits differently processed messages according to its probability of being fraudulent. Finally, target nodes also receive the messages differently depending on their pre-predicted probability. Extensive experimental results on real-world benchmarks demonstrate that LEX-GNN outperforms existing state-of-the-art baselines. Our code is available at https://github.com/wdhyun/LEX-GNN.
Factor models, originating in finance for asset pricing, are fundamental tools in quantitative investment. Recently, there has been a trend towards adopting more flexible machine learning approaches instead of previous linear models. However, traditional factor models and recent deep learning approaches either overlook the relationships among stocks or rely on static, predefined ones, which hampers their representational power and hinders their ability to dynamically adapt to market changes. To overcome this limitation, we introduce a novel dynamic factor model named GraphVAE. This model leverages temporal adaptive dynamic stock relationship graphs, facilitating improved information transfer among stocks within the dynamic probabilistic factor model. Experimental results on three real stock market datasets demonstrate that our method outperforms various state-of-the-art approaches.
The Randomized Controlled Trial (RCT) or A/B testing is considered the gold standard method for estimating causal effects. Fisher famously advocated randomly allocating experiment units into treatment and control groups to preclude systematic biases. We propose a variant of systematic sampling called Covariate Ordered Systematic Sampling (COSS). In COSS, we order experimental units using a pre-experiment covariate and allocate them alternately into treatment and control groups. Using theoretical proofs, experiments on simulated data, and hundreds of A/B tests conducted within 3 real-world marketing campaigns, we show how our method achieves better sensitivity gains than commonly used variance reduction techniques like CUPED while retaining the simplicity of RCTs.
This paper presents the Customer Experience (CX) Simulator, a novel framework designed to assess the effects of untested web-marketing campaigns through user behavior simulations. The proposed framework leverages large language models (LLMs) to represent various events in a user's behavioral history, such as viewing an item, applying a coupon, or purchasing an item, as semantic embedding vectors. We train a model to predict transitions between events from their LLM embeddings, which can even generalize to unseen events by learning from diverse training data. In web-marketing applications, we leverage this transition prediction model to simulate how users might react differently when new campaigns or products are presented to them. This allows us to eliminate the need for costly online testing and enhance the marketers' abilities to reveal insights. Our numerical evaluation and user study, utilizing BigQuery Public Datasets from the Google Merchandise Store, demonstrate the effectiveness of our framework.
Sequential recommendation aims to predict the next item a user is likely to prefer based on their sequential interaction history. Recently, text-based sequential recommendation has emerged as a promising paradigm that uses pre-trained language models to exploit textual item features to enhance performance and facilitate knowledge transfer to unseen datasets. However, existing text-based recommender models still struggle with two key challenges: (i) representing users and items with multiple attributes, and (ii) matching items with complex user interests. To address these challenges, we propose a novel model, Matching Attribute-aware Representations for Text-based Sequential Recommendation (MARS). MARS extracts detailed user and item representations through attribute-aware text encoding, capturing diverse user intents with multiple attribute-aware representations. It then computes user-item scores via attribute-wise interaction matching, effectively capturing attribute-level user preferences. Our extensive experiments demonstrate that MARS significantly outperforms existing sequential models, achieving improvements of up to 24.43% and 29.26% in Recall@10 and NDCG@10 across five benchmark datasets.
Bundle recommender systems aim to recommend suitable collections (i.e., bundles) of items to each user, meeting their diverse needs with all-in-one convenience. Typically, they utilize three distinct types of information: user-bundle purchase interactions (U-B view), user-item purchase interactions (U-I view), and bundle-item affiliations (B-I view). Our focus is on better integrating these three perspectives (i.e., views) to deliver more accurate bundle recommendations. Our examination of different role (main or sub-views) combinations of the views reveals two key observations: (1) the best combination varies across target users (i.e., who receive recommendations), and (2) the U-I view is relatively weak as the main role. Driven by these observations, we propose PET, which synergizes the three views through (1) personalized view weighting, (2) U-I view enhancement, and (3) two-pronged contrastive learning. Our extensive experiments demonstrate that PET significantly outperforms existing methods in all popular benchmark datasets. Our code and datasets are available at https://github.com/K-Kyungho/PET.
Identifying cohesive subgraphs within networks is a fundamental problem in graph theory, relevant to various domains. The traditional clique problem, which finds fully connected subgraphs, often faces limitations due to its strict connectivity requirements. This paper introduces a novel degree-based relaxation model called Flexi-clique, where the degree constraint is adjusted sub-linearly based on the subgraph size. We establish that the maximum Flexi-clique problem is NP-hard and propose an efficient and effective peeling algorithm to address it. Our extensive experimental evaluation of real-world datasets demonstrates the effectiveness and efficiency of our approach in discovering large, cohesive subgraphs in networks.
Autonomous vehicles make decisions and controls based on various object recognition results. The driving environment is characterized by the coexistence of a multitude of objects of varying shapes and sizes. Therefore, the ability to accurately recognise fine-grained objects is essential for accurate object recognition in a variety of changing situations. For object detection, the autonomous vehicle performs bounding box and segmentation to provide the detected object information. However, bounding box and segmentation based object detection has difficulties in identifying objects with complex shapes, small or distant objects, and it is hard to distinguish and detect objects with similar colors to the background or similar colors and textures to surrounding objects. This has limitations for reliable object identification in autonomous driving environments containing a variety of objects, which is a challenge for clear criteria-based object avoidance and collision protection. To overcome these limitations, this paper proposes Edge-Adaptive Depth Estimation(EADE). EADE, the combination of edge extraction and depth estimation, enables detailed edge extraction and partial distance estimation of objects even in environments where object shape and size, surrounding objects, and backgrounds make it difficult to recognise distinct objects, which allows for reliable autonomous decision-making and control based on detailed object collision and avoidance criteria. To validate EADE, experiments were conducted with real-world driving environment image data. The results of EADE demonstrate that detailed object recognition is possible with clear edge recognition and estimation of object distance, even for complex shaped objects such as trees with branches in multiple directions, distant objects, and objects that are difficult to distinguish from the background such as curbs.
Recent advances in Hierarchical Multi-label Classification (HMC), particularly neurosymbolic-based approaches, have demonstrated improved consistency and accuracy by enforcing constraints on a neural model during training. However, such work assumes the existence of such constraints a-priori. In this paper, we relax this strong assumption and present an approach based on Error Detection Rules (EDR) that allow for learning explainable rules about the failure modes of machine learning models. We show that these rules are not only effective in detecting when a machine learning classifier has made an error but also can be leveraged as constraints for HMC, thereby allowing the recovery of explainable constraints even if they are not provided. We show that our approach is effective in detecting machine learning errors and recovering constraints, is noise tolerant, and can function as a source of knowledge for neurosymbolic models on multiple datasets, including a newly introduced military vehicle recognition dataset.
Text-to-image generation is a multivariable process in which the resulting quality is determined by both the generative model and the input prompt. While previous efforts rely on a single model either by enhancing its capability or by reformulating prompts, we point out that no single model excels at handling all types of tasks, as there exist inter-model and intra-model quality variance induced by the difference in types of prompts. This paper explores the relationship between the generation quality of text-to-image models and the linguistic features of input prompts by measuring the performance of state-of-the-art models using five different prompt datasets each with its distinctive features. Motivated by our empirical observations, we propose a novel approach that assigns each prompt to its best-performing model based on quality prediction. This enables utilizing a diverse set of models each with its expertise and cost, thereby enhancing cost-effectiveness. Evaluation results show that our approach can reduce the total generation cost by 29.25% with comparable or even higher generation quality than using only the single best model.
Heterogeneous networks contain multiple types of nodes and links, with some link types encapsulating hierarchical structure over entities. Hierarchical relationships can codify information such as subcategories or one entity being subsumed by another and are often used for organizing conceptual knowledge into a tree-structured graph. Hyperbolic embedding models learn node representations in a hyperbolic space suitable for preserving the hierarchical structure. Unfortunately, current hyperbolic embedding models only implicitly capture the hierarchical structure, failing to distinguish between node types, and they only assume a single tree. In practice, many networks contain a mixture of hierarchical and non-hierarchical structures, and the hierarchical relations may be represented as multiple trees with complex structures, such as sharing certain entities. In this work, we propose a new hyperbolic representation learning model that can handle complex hierarchical structures and also learn the representation of both hierarchical and non-hierarchic structures. We evaluate our model on several datasets, including identifying relevant articles for a systematic review, which is an essential tool for evidence-driven medicine and node classification.
Item popularity in real-world data follows a long-tail distribution, where a few items attract most of the attention, while the majority receive much less. This disparity results in high-quality embeddings for popular (head) items, but lower-quality embeddings for unpopular (tail) items, leading to less accurate recommendations for the latter. Our observations confirm that embeddings of tail items often exhibit (1) magnitudes (i.e., norms) that are less reflective of actual popularity and (2) directions that are less effective in capturing user preferences, compared to those of head items.
To address this issue, we propose EDGE, a post-training embedding enhancement method for long-tail recommendations. EDGE employs two key strategies: (1) refining embedding magnitudes to better reflect item popularity and (2) adjusting embedding directions by leveraging knowledge from head items. Importantly, EDGE is model-agnostic and can be applied to embeddings learned from any trained recommender system. Experimental results show that EDGE significantly improves tail item recommendation performance and overall system performance, achieving up to an improvement of 211.23% in NDCG@20 over the state-of-the-art method. Our code and datasets are available at https://github.com/geon0325/EDGE.
The goal of document-level relation extraction is to extract semantic information from multiple sentences within a document and identify the relations between entities across sentences. However, effectively representing the document's content and reasoning about cross-sentence entities presents a formidable challenge. In this paper, we propose an efficient Document-Level Relation Extraction Model based on Heterogeneous Graph Reasoning (HGR-DREM), which enables relation extraction more accurate. Specifically, we first construct a document-level heterogeneous graph to comprehensively capture the semantic relations between entities. Then, we design a meta-path attention-based reasoning mechanism to enhance the mutual influence among graph nodes. Furthermore, we utilize an extended adjacency matrix to represent the heterogeneous graph and leverage graph convolutional neural networks (GCNs) to extract high-dimensional features. The experiments on a real-world dataset demonstrate the effectiveness of our proposed model. All codes have been released at https://github.com/NuyoaH-code/HGR-DREM.
As the Internet of Things (IoT) evolves, the need for enhanced data-sharing to improve edge device performance has led to the adoption of Federated Learning (FL) for data privacy and optimized data utilization. However, communication costs in FL remain a significant challenge. Traditional methods focus on client enhancements but overlook server-side aggregation, potentially increasing client computation loads. In response, we introduce a novel method, FedDiff, which utilizes diffusion models for generating model weights on FL servers, replacing traditional aggregation methods. Our approach, tailored for heterogeneous environments, significantly improves communication efficiency, achieving faster convergence and robust performance against weight noise in rigorous tests.
Significant work has been conducted in the domain of food computing, yet these studies typically focus on single tasks such as t2t (instruction generation from food titles and ingredients), i2t (recipe generation from food images), or t2i (food image generation from recipes). None of these approaches integrate all modalities simultaneously. To address this gap, we introduce a novel food computing foundation model that achieves true multimodality, encompassing tasks such as t2t, t2i, i2t, it2t, and t2ti. By leveraging large language models (LLMs) and pre-trained image encoder and decoder models, our model can perform a diverse array of food computing-related tasks, including food understanding, food recognition, recipe generation, and food image generation. Compared to previous models, our foundation model demonstrates a significantly broader range of capabilities and exhibits superior performance, particularly in food image generation and recipe generation tasks. We open-sourced ChefFusion at https://github.com/Peiyu-Georgia-Li/ChefFusion-Multimodal-Foundation-Model-Integrating-Recipe-and-Food-Image-Generation.git.
The k-center clustering problem is of fundamental importance for a broad range of machine learning and data science applications. In this paper, we study the deletion-robust version of the problem. Specifically, we aim to extract a small subset of a given data set, referred to as a coreset, that contains a provably good set of k centers even after an adversary deletes up to z arbitrarily chosen points from the data set. We propose a 4-approximation algorithm that provides a coreset of size O(kz). To our knowledge, this is the first algorithm for deletion-robust k-center clustering with a theoretical guarantee. Moreover, we accompany our theoretical results with extensive experiments, demonstrating that our algorithm achieves significantly better robustness than non-trivial baselines against three heuristic gray-box and white-box adversarial deletion attacks.
Multi-behavior recommendation (MBR) aims at predicting the items that the user will interact with at the next moment through the target behavior. Most existing MBR models are devoted to designing novel graph convolutional networks to combine multi-behavioral information. However, they ignore the negative impact of auxiliary behaviours, and also fail to take into account the effects of item characteristics. These limitations can lead to model performance degradation and affect user satisfaction. To address these issues, we propose a Knowledge-enhanced Dynamic Modeling framework for Multi-Behavior Recommendation (KDMBR). The algorithm utilises a multi-behavioral interaction module and a knowledge graph module to capture the user's overall interest and feature information respectively. The former designs a behavior-aware attention to distinguish contributions between behaviors. The latter introduces KG to enrich item characteristics and proposes a graph reconstruction strategy to enrich user information. Experiments on two large datasets further demonstrate the effectiveness of KDMBR.
Prompt learning plays a key role in aligning the task of news recommendation (NR) with the Pre-trained Language Models (PLMs). However, current prompt-based NR methods utilize fixed templates and answer words, ignoring the personalization of user's demand and the diversity between news topics. To this end, we propose an Automatic Prompt based NR (AutoPNR) scheme, which automatically generates individual templates for users according to their potential interests, and customized answer words w.r.t. the topics of candidate news. Concretely, such an individual template utilizes several specific tokens to encode a user's interest extracted from her/his reading history, while a pair of customized answer words are retrieved from a large vocabulary (often existing alongside PLMs) based on the topic of candidate news. Through extensive experiments on the real-world datasets, we show that our AutoPNR works well with different PLMs, and considerably outperforms state-of-the-art NR techniques.
Significant wave height (SWH) is a vital metric in marine science, and accurate SWH estimation is crucial for various applications, e.g., marine energy development, fishery, early warning systems for potential risks, etc. Traditional SWH estimation methods that are based on numerical models and physical theories are hindered by computational inefficiencies. Recently, machine learning has emerged as an appealing alternative to improve accuracy and reduce computational time. However, due to limited observational technology and high costs, the scarcity of real-world data restricts the potential of machine learning models. To overcome these limitations, we propose an ocean SWH estimation framework, namely Orca. Specifically, Orca enhances the limited spatio-temporal reasoning abilities of classic LLMs with a novel spatiotemporal aware encoding module. By segmenting the limited buoy observational data temporally, encoding the buoys' locations spatially, and designing prompt templates, Orca capitalizes on the robust generalization ability of LLMs to estimate significant wave height effectively with limited data. Experimental results on the Gulf of Mexico demonstrate that Orca achieves state-of-the-art performance in SWH estimation.
Job-market mobility prediction plays a crucial role in optimizing human capital usage for both employees and employers. Most conventional methods primarily focus on learning sequential career sequences while ignoring the sufficient information extraction of mutual entity correlations in the job market. In this work, we push forward to exploit the heterogeneous relational knowledge among the job market structures by proposing a model namely Attentive Heterogeneous Knowledge Learning and Synergy (AHKLS). Equipped with the subsequent module of time-aware perception, AHKLS achieves effective career trajectory encoding for job-market mobility prediction. To evaluate the AHKLS performance, we conduct extensive experiments on three real-world datasets with different sizes. The empirical analyses demonstrate not only the performance superiority of AHKLS over several competing methods, but also the module effectiveness and model compatibility with other methods in enhancing the mobility prediction tasks accordingly.
News recommendations heavily rely on Natural Language Processing (NLP) methods to analyze, understand, and categorize content, enabling personalized suggestions based on user interests and reading behaviors. Large Language Models (LLMs) like GPT-4 have shown promising performance in understanding natural language. However, the extent of their applicability to news recommendation systems remains to be validated. This paper introduces RecPrompt, the first self-tuning prompting framework for news recommendation, leveraging the capabilities of LLMs to perform complex news recommendation tasks. This framework incorporates a news recommender and a prompt optimizer that applies an iterative bootstrapping process to enhance recommendations through automatic prompt engineering. Extensive experimental results with 400 users show that RecPrompt can achieve an improvement of 3.36% in AUC, 10.49% in MRR, 9.64% in nDCG@5, and 6.20% in nDCG@10 compared to deep neural models. Additionally, we introduce TopicScore, a novel metric to assess explainability by evaluating LLM's ability to summarize topics of interest for users. The results show LLM's effectiveness in accurately identifying topics of interest and delivering comprehensive topic-based explanations.
The shuffle model of Differential Privacy (DP) is an enhanced privacy protocol which significantly amplifies the central DP guarantee by anonymizing and shuffling the local randomized data. Yet, deriving a tight privacy bound is challenging due to its complicated randomization protocol. While most existing works focused on uniform local privacy settings, this work focuses on a more practical personalized privacy setting. To bound the privacy after shuffling, we need to capture the probability of each user generating clones of the neighboring data points and quantify the indistinguishability between two distributions of the number of clones on neighboring datasets. Existing works either inaccurately capture the probability or underestimate the indistinguishability. We develop a more precise analysis, which yields a general and tighter bound for arbitrary DP mechanisms. Firstly, we derive the clone-generating probability by hypothesis testing, which leads to a more accurate characterization of the probability. Secondly, we analyze the indistinguishability in the context of f-DP, where the convexity of the distributions is leveraged to achieve a tighter privacy bound. Theoretical and numerical results demonstrate that our bound remarkably outperforms the existing results in the literature. The code is publicly available at https://github.com/Emory-AIMS/HPS.git **REMOVE 2nd URL**://github.com/Emory-AIMS/HPS.git.
Attributed graph clustering, which aims to group the nodes of an attributed graph into disjoint clusters, has made promising advancements in recent years. However, most existing methods face challenges when applied to large graphs due to the expensive computational cost and high memory usage. In this paper, we introduce Scalable and Adaptive Spectral Embedding (SASE), a simple attributed graph clustering method devoid of parameter learning. SASE comprises three main components: node features smoothing via k-order simple graph convolution, scalable spectral clustering using random Fourier features, and adaptive order selection. With these designs, SASE not only effectively captures global cluster structures but also exhibits linear time and space complexity relative to the graph size. Empirical results demonstrate the superiority of SASE. For example, on the ArXiv dataset with 169K nodes and 1.17M edges, SASE achieves a 6.9% improvement in ACC and a 5.87× speedup compared to the runner-up, S3GC.
With the advent of generative deep learning models, generative IR has gained increasing attention. However, existing methods face two issues: (1) when a document is represented by a single semantic ID, the retrieval model may fail to capture the multifaceted and complex content of the document; and (2) when the generated training data exhibits semantic ambiguity, the retrieval model may struggle to distinguish the differences in the content of similar documents. To address these issues, we propose Multi-DSI to (1) offer multiple non-deterministic semantic identifiers and (2) align the concepts of queries and documents to avoid ambiguity. Extensive experiments on two benchmark datasets demonstrate that Multi-DSI significantly outperforms baseline methods by 7.4%.
This study challenges the prevailing approach of measuring political leanings in Large Language Models (LLMs) through direct questioning. By extensively testing LLMs with original, positively and negatively paraphrased Political Compass questions we demonstrate that LLMs do not consistently reveal their political biases in response to standard questions. Our findings indicate that LLMs' political orientations are elusive, easily influenced by subtle changes in phrasing and context. This study underscores the limitations of direct questioning in accurately measuring the political biases of LLMs and emphasizes the necessity for more refined and effective approaches to understand their true political stances.
Conversational Product Search (CPS) provides an engaging way for users to find products through effective natural language conversations. However, understanding the effect of conversational characteristics on user search performance and when to ask clarifying questions or recommend products remains unexplored. To fill the gap, we conduct an empirical study in this paper. Specifically, we developed a conversational system that allows participants to join as customers or shopping assistants, to simulate the conversational product search activity. Data collected from conversations and participant feedback indicate that: (a) CPS systems tend to ask clarifying questions early in the conversation when users express the intent of issuing a new query and chitchat, while they tend to recommend products at a later stage of conversations; asking clarifying questions early and recommending products lately can significantly improve search performance and user's satisfaction; (b) asking clarifying questions and more fine-grained search keywords positively influence search performance in terms of finding relevant products; (c) although the conversation time has a positive impact on the number of recommended products, the performance gain diminishes with longer conversation time; (d) more clarifying questions, more conversation turns, and longer system response time lead to decreased user satisfaction.
Autism spectrum disorder (ASD) is a prevalent neurodevelopmental condition. Prompt recognition and treatment are vital for enhancing the life quality of individuals affected by ASD. However, current research either focus on a single atlas or a simple matrix concatenation combination, neglecting the complex and spatial relationship among the brain regions in different atlases. To tackle this weakness, in this paper, we propose a novel multi-atlas time-series feature fusion model with three steps based on spatial overlap proportion of brain regions to obtain an explainable representation of brain networks, which aims to achieve excellent diagnosis of ASD/TC. Specifically, we formally introduce the concept of spatial overlap and give its measurement, spatial overlap proportion. Then, we fuse the brain regions of multi-atlas to obtain an explainable brain networks of each subject. Finally, the GCN classifier is used to perform the final classification. The experimental results on Autism Brain Imaging Data Exchange (ABIDE) demonstrate that our proposed method achieved an accuracy of 0.771. Overall, our method outperforms SOTA methods in ASD/TC classification.
While considerable research has delved into detecting toxic content in text-based data, the realm of video content, particularly in languages other than English, has received less attention. Prior studies have primarily focused on creating automated tools to identify online toxic speech but have often overlooked the crucial next steps of mitigating its impact and discouraging future use. We can discourage social media users from sharing such material by automatically generating interventions that explain why certain content is inappropriate. To bridge this research gap, we propose an innovative task: generating interventions for toxic videos in code-mixed languages which go beyond existing methods focusing on text and images to combat online toxicity. We are introducing a Toxic Code-Mixed Intervention Video benchmark dataset (ToxCMI), comprising 1697 code-mixed toxic video utterances sourced from YouTube. Each utterance in this dataset has been meticulously annotated for toxicity and severity, accompanied by interventions provided in Hindi-English code-mixed languages. We have developed an advanced multimodal framework ToxVI, specifically designed for the task of generating Toxic Video appropriate Interventions, leveraging Large Language Models (LLMs), which comprises three modules - Modality module, Cross-Modal Synchronization module and Generation module. Our experiments demonstrate that integrating multiple modalities from the videos significantly enhances the performance of the proposed task and outperforms all the baselines by a significant margin.
Generally, items with missing modalities are dropped in multimodal recommendation. However, with this work, we question this procedure, highlighting that it would further damage the pipeline of any multimodal recommender system. First, we show that the lack of (some) modalities is, in fact, a widely-diffused phenomenon in multimodal recommendation. Second, we propose a pipeline that imputes missing multimodal features in recommendation by leveraging traditional imputation strategies in machine learning. Then, given the graph structure of the recommendation data, we also propose three more effective imputation solutions that leverage the item-item co-purchase graph and the multimodal similarities of co-interacted items. Our method can be plugged into any multimodal RSs in the literature working as an untrained pre-processing phase, showing (through extensive experiments) that any data pre-filtering is not only unnecessary but also harmful to the performance.
The shopping query suggestion offers personalized queries to users and plays a crucial role in search engines. However, existing shopping query suggestion methods suffer from poor task generalization and limited semantic comprehension problems. This paper presents a comprehensive framework for the shopping query suggestion that effectively addresses the shortcomings of existing approaches. Our proposed framework leverages a generative language model and fine-grained preference alignment to enhance semantic comprehension and improve the quality of generated queries. Our key contributions include the introduction of a personalized prompt set for diverse query suggestion tasks, the integration of interaction behavior time to capture user query interests, and the utilization of reinforcement learning techniques to align user preferences. Experimental results demonstrate enhancements in different scenarios. Our codes are available at https://github.com/1170300319/CIKM2024_SOUP.
Personalized conversational information retrieval (CIR) combines conversational and personalizable elements to satisfy various users' complex information needs through multi-turn interaction based on their backgrounds. The key promise is that the personal textual knowledge base (PTKB) can improve the CIR effectiveness because the retrieval results can be more related to the user's background. However, PTKB is noisy: not every piece of knowledge in PTKB is relevant to the specific query at hand. In this paper, we explore and test several ways to select knowledge from PTKB and use it for query reformulation by using a large language model (LLM). The experimental results show the PTKB might not always improve the search results when used alone, but LLM can help generate a more appropriate personalized query when high-quality guidance is provided.
The balance between model capacity and generalization has been a key focus of recent discussions in long-term time series forecasting. Two representative channel strategies are closely associated with model expressivity and robustness, including channel independence (CI) and channel dependence (CD). The former adopts individual channel treatment and has been shown to be more robust to distribution shifts, but lacks sufficient capacity to model meaningful channel interactions. The latter is more expressive for representing complex cross-channel dependencies, but is prone to overfitting. To balance the two strategies, we present a channel-aware low-rank adaptation method to condition CD models on identity-aware individual components. As a plug-in solution, it is adaptable for a wide range of backbone architectures. Extensive experiments show that it can consistently and significantly improve the performance of both CI and CD models with demonstrated efficiency and flexibility. The code is available at https://github.com/tongnie/C-LoRA.
Rapid advancements in artificial intelligence (AI) have made it crucial to integrate moral reasoning into AI systems. However, existing models and datasets often overlook regional and cultural differences. To address this shortcoming, we have expanded the JCommonsenseMorality (JCM) dataset, the only publicly available dataset focused on Japanese morality. The Extended JCM (eJCM) has grown from the original 13,975 sentences to 31,184 sentences using our proposed sentence expansion method called Masked Token and Label Enhancement (MTLE). MTLE selectively masks important parts of sentences related to moral judgment and replaces them with alternative expressions generated by a large language model (LLM), while re-assigning appropriate labels. The model trained using our eJCM achieved an F1 score of 0.857, higher than the scores for the original JCM (0.837), ChatGPT one-shot classification (0.841), and data augmented using AugGPT, a state-of-the-art augmentation method (0.850). Specifically, in complex moral reasoning tasks unique to Japanese culture, the model trained with eJCM showed a significant improvement in performance (increasing from 0.681 to 0.756) and achieved a performance close to that of GPT-4 Turbo (0.787). These results demonstrate the validity of the eJCM dataset and the importance of developing models and datasets that consider the cultural context.
Many existing Graph Neural Networks (GNN) methods assume that labels are reliable and sufficient, which may not be the case in real-world scenarios. This paper addresses one such problem of Partial Label Learning (PLL) on graph-structured data. In the PLL for graphs, each node is represented by a candidate set of labels, where only one is true while the others are inaccurate. Despite advancements with PLL in tabular and vision domains, the graph-structured data still needs to be explored. In this work, we first define PLL for graphs. Subsequently, we propose a new PLD-Graph algorithm for PLL in homogeneous graphs with scarce labels. We utilize graph augmentation to reduce the effects of inexact labels and provide additional supervision from unlabeled nodes. Progressive label disambiguation is performed based on the model's ability to predict correct classes. Furthermore, an additional loss estimates the label corruption matrix to capture associations between correct and incorrect labels. We show the effectiveness of the proposed algorithm on multiple graph datasets, with two types of noise and varying levels of ambiguous labels. Overall, the proposed PLD-Graph algorithm outperforms state-of-the-art PLL methods.
Electronic health records (EHRs) are multimodal by nature, consisting of structured tabular features like lab tests and unstructured clinical notes. In real-life clinical practice, doctors use complementary multimodal EHR data sources to get a clearer picture of patients' health and support clinical decision-making. However, most EHR predictive models do not reflect these procedures, as they either focus on a single modality or overlook the inter-modality interactions/redundancy. In this work, we propose MEDFuse, a Multimodal EHR Data Fusion framework that incorporates masked lab-test modeling and large language models (LLMs) to effectively integrate structured and unstructured medical data. MEDFuse leverages multimodal embeddings extracted from two sources: LLMs fine-tuned on free clinical text and masked tabular transformers trained on structured lab test results. We design a disentangled transformer module, optimized by a mutual information loss to 1) decouple modality-specific and modality-shared information and 2) extract useful joint representation from the noise and redundancy present in clinical notes. Through comprehensive validation on the public MIMIC-III dataset and the in-house FEMH dataset, MEDFuse demonstrates great potential in advancing clinical predictions, achieving over 90% F1 score in the 10-disease multi-label classification task.
Automatic news articles clustering is one of the most important tasks for news publishers. Traditional unsupervised models exploit generic text representation (e.g., BERT) and typically do not consider the relationships between each paragraph in news articles. Such depth learning from news articles is important for clustering full-length articles. Recently contrastive learning (CL) has shown to be a popular method for representation learning that uses positive and negative data pairs generated using data augmentation techniques to improve the representation in the latent space. In this work, we propose text augmentation methods and use contrastive learning to cluster daily growing full-length German news articles. Our experiments on four German news article datasets (one labeled and three unlabeled datasets) demonstrate that contrastive learning and our text augmentation methods significantly improve the representation of news articles compared to generic pre-trained text representation and have high performance for clustering tasks.
Numerous reinforcement learning-based traffic control methods have been proposed to enhance transportation efficiency and alleviate traffic congestion. However, the existing solutions predominantly address only specific facets of the challenge. For instance, some focus on enhancing control performance, while others aim to address control issues within heterogeneous road networks, explore model generalizability, or consider model compression for practical deployment scenarios. The question arises: Can a single approach effectively tackle all these issues concurrently? We propose a holistic framework, Hol-Light, that can effectively solve all these issues. It delves deeply into the feature representations of traffic phase considering the interplay between phase relationships and traffic flow dynamics, and meticulously captures the interphase relationships through an elegantly designed model that is both parameter-efficient and minimal. To substantiate the efficacy of our approach, we conducted comprehensive experiments utilizing two well-established traffic simulators: CityFlow and SUMO. The experiment results indicate that our method excels in terms of high performance, rapid training speed, robust generalization capabilities, and adaptability to diverse road network configurations.
Achieving coordination while avoiding suboptimal equilibria poses a significant challenge in decentralized multi-agent reinforcement learning (MARL) systems operating under limited global information. Conventional decentralized approaches have struggled to effectively induce cooperative behaviors between agents. We propose a novel hierarchical framework that synergistically combines large language models (LLMs) and deep reinforcement learning to address this challenge. Our proposed learning-to-share (ILTS) method decomposes the global objective into a two-level hierarchy: high-level LLM policy determines how to share rewards between neighboring agents to shape emergent collaboration, while low-level policies optimize the induced localized objectives using Q value networks. A meta-learning sophisticated dynamic reward-sharing scheme via LLMs is developed to facilitate decentralized cooperation without explicit communication. Experimental results demonstrate that ILTS outperforms prior MARL algorithms across cooperative multi-agent tasks by inducing collaborative strategies and performing intention propagation from lightweight learned signals. This hierarchical framework avoids the need for hand-engineered rewards or explicit communication while promoting scalable learning of intricate symbiotic behaviors between agents perceiving only local observations.
Detecting the veracity of a statement automatically is a challenge the world is grappling with due to the vast amount of data spread across the web. Verifying a given claim typically entails validating it within the framework of supporting evidence like a retrieved piece of text. Classifying the stance of the text with respect to the claim is called stance classification. Despite advancements in automated fact-checking, most systems still rely on a substantial quantity of labeled training data, which can be costly. In this work, we avoid the costly training or fine-tuning of models by reusing pre-trained large language models together with few-shot in-context learning. Since we do not train any model, our approach ExPrompt is lightweight, demands fewer resources than other stance classification methods and can serve as a modern baseline for future developments. At the same time, our evaluation shows that our approach is able to outperform former state-of-the-art stance classification approaches regarding accuracy by at least 2 percent. Our scripts and data used in this paper are available at https://github.com/factcheckerr/ExPrompt.
Predicting students' performance early in programming courses is crucial because it allows instructors to intervene early, improving learning outcomes. Currently, no existing platforms can effectively forecast student performance in programming activities based on students' developed code. Forecasting student scores based on their programming activities is challenging because the accuracy of different predictive models often varies throughout these activities. To address this challenge, we introduce a novel framework utilizing Mixture of Experts (MoE). The MoE method combines insights from various neural networks and dynamically picks the most accurate predictions. This system significantly enhances the reliability of forecasting each student's performance within the first 15 minutes of a 30-minute programming session. By enabling early predictions, the MoE provides instructors with a powerful mechanism to understand and support the student learning process in real-time.
Large neural models are often compressed before deployment. Model compression is necessary for many practical reasons, such as inference latency, memory footprint, and energy consumption. Compressed models are assumed to be miniature versions of corresponding large neural models. However, we question this belief in our work. We compare compressed models with corresponding large neural models using four model characteristics: prediction errors, data representation, data distribution, and vulnerability to adversarial attack. We perform experiments using the BERT-large model and its five compressed versions. For all four model characteristics, compressed models significantly differ from the BERT-large model. Even among compressed models, they differ from each other on all four model characteristics. Apart from the expected loss in model performance, there are major side effects of using compressed models to replace large neural models.
As the calculation of centrality in complex networks becomes increasingly vital across technological, biological, and social systems, precise and scalable ranking methods are essential for understanding these networks. This paper introduces LayerPlexRank, an algorithm that simultaneously assesses node centrality and layer influence in multiplex networks using algebraic connectivity metrics. This method enhances the robustness of the ranking algorithm by effectively assessing structural changes across layers using random walk, considering the overall connectivity of the graph. We substantiate the utility of LayerPlexRank with theoretical analyses and empirical validations on varied real-world datasets, contrasting it with established centrality measures.
We leverage generative NLP-based models, specifically Transformer-Based models, for multi-horizon univariate and multivariate power consumption forecasting. We apply our approach to various datasets, focusing on short-term (1 day) and long-term (1 week) forecasts. We test several lag configurations with and without additional contextual information and achieve promising results. We evaluate the forecasts' effectiveness using a range of metrics, and aggregate the results on a monthly basis for a comprehensive understanding of the performance throughout the year.
Graph Neural Networks (GNNs) have emerged as the predominant method for analyzing graph-structured data. However, canonical GNNs have limited expressive power and generalization capability, thus triggering the development of more expressive yet computationally intensive methods. One such approach is to create a series of perturbed versions of input graphs and then repeatedly conduct multiple message-passing operations on all variations during training. Despite their expressive power, this approach does not scale well on larger graphs. To address this scalability issue, we introduce Scalable Expressiveness through Preprocessed Graph Perturbation (SE2P). This model offers a flexible, configurable balance between scalability and generalizability with four distinct configuration classes. Our extensive experiments demonstrate that SE2P can enhance generalizability compared to benchmarks while achieving significant speed improvements of up to 8-fold.
As machine learning and deep learning become increasingly integrated into our daily lives, understanding how these technologies make decisions is crucial. To ensure transparency, accountability, and ethical adherence, these so-called "black-box" models should be accompanied by human-comprehensible explanations of their predictions. This clarity is essential for establishing trust in their real-world applications. Similarly, it is crucial to compare different types of explanations to evaluate and understand their effectiveness, interpretability, and generalization capabilities for informed selection in various applications. To this end, we propose a framework called EDGE to evaluate diverse knowledge graph explanations, assessing logical rule-based and subgraph-based explanations by various explainers in terms of prediction accuracy and fidelity to the Graph Neural Network (GNN) model. Our evaluations reveal that logical methods excel in explaining complex and structured data, while subgraph-based models exhibit higher fidelity to the GNN model, earning them the label "GNN Explainers". Although further diversified evaluations are necessary to determine the superiority of one explanation type over another, our study shows that each type has pros and cons.
Traffic speed prediction is a crucial task for optimizing navigation systems and reducing traffic congestion. Although there have been efforts to improve the accuracy of speed prediction by incorporating auxiliary features, such as traffic flow, weather, and time, types of auxiliary features are limited and their detailed relationships with speed have not been explored yet. In our study, we present the individual spatio-temporal (IST) dependencies on flow and speed, and characterize three types of IST-dependencies with the flow-to-flow, speed-to-speed, and flow-to-speed graphs. Then, we propose Auxiliary feature-aided Attention Network (ARIAN), a novel approach to judiciously learning the degrees of IST-dependencies with the three graphs and predicting the future speed by leveraging various auxiliary features. Through comprehensive experiments using 3 real-world datasets, we validate the superiority of ARIAN over 10 state-of-the-art methods and the effectiveness of each auxiliary feature and each dependency learner in ARIAN.
Verifying fact-checking claims poses a significant challenge, even for humans. Recent approaches have demonstrated that decomposing claims into relevant questions to gather evidence enhances the efficiency of the fact-checking process. In this paper, we provide empirical evidence showing that this question decomposition can be effectively automated. We demonstrate that smaller generative models, fine-tuned for the question generation task using data augmentation from various datasets, outperform large language models by up to 8. Surprisingly, in some cases, the evidence retrieved using machine-generated questions proves to be significantly more effective for fact-checking than that obtained from human-written questions. We also perform manual evaluation of the decomposed questions to assess the quality of the questions generated.
Computer vision applications such as object detection have increased manifolds in the medical domain for diagnosis and treatment purposes. Generally, object detection models such as YOLO(You Only Look Once) involve identifying the correct bounding box and classifying the objects inside the bounding box. However, medical imaging object detection is a challenging endeavor, requiring models that are both efficient and extremely accurate in the face of limited data and expensive annotations. In this paper, we propose a Min-Max IoU (M2IoU) loss function by introducing a new min-max-based penalty term in the loss equation, between the predicted box and the ground truth coordinates. We further compare the results of several loss functions on the YOLOv8 model trained on multiple medical datasets and demonstrate that the M2IoU loss function leads to faster learning and outperforms other existing loss functions like CIoU and GIoU.
Web-scale search systems typically tackle the scalability challenge with a two-step paradigm: retrieval and ranking. The retrieval step, also known as candidate selection, often involves extracting entities, creating an inverted index, and performing term matching for retrieval. Such traditional methods require manual and time-consuming development of retrieval models. In this paper, we propose a framework for constructing a graph that integrates human knowledge with user activity data analysis. The learned links are utilized for retrieval purposes. The model is easy to explain, debug, and tune. The system implementation is straightforward and can directly leverage existing inverted index systems. We applied this retrieval framework to enhance the job search and recommendation systems on a large professional networking portal, resulting in significant performance improvements.
Adapting Large Language Models for Recommendation (LLM4Rec) has shown promising results. However, the challenges of deploying LLM4Rec in real-world scenarios remain largely unexplored. In particular, recommender models need incremental adaptation to evolving user preferences, while the suitability of traditional incremental learning methods within LLM4Rec remains ambiguous due to the unique characteristics of Large Language Models (LLMs).
In this study, we empirically evaluate two commonly employed incremental learning strategies (full retraining and fine-tuning) for LLM4Rec. Surprisingly, neither approach shows significant improvements in the performance of LLM4Rec. Instead of dismissing the role of incremental learning, we attribute the lack of anticipated performance enhancement to a mismatch between the LLM4Rec architecture and incremental learning: LLM4Rec employs a single adaptation module for learning recommendations, limiting its ability to simultaneously capture long-term and short-term user preferences in the incremental learning context. To test this speculation, we introduce a Long- and Short-term Adaptation-aware Tuning (LSAT) framework for incremental learning in LLM4Rec. Unlike the single adaptation module approach, LSAT utilizes two distinct adaptation modules to independently learn long-term and short-term user preferences. Empirical results verify that LSAT enhances performance, thereby validating our speculation.
Copying tables from documents and applications without proper tabular support, like PDF documents, web pages or images, surprisingly remains a challenge. In this paper, we present Revilio, a novel neurosymbolic system for reconstructing tables when their column boundaries have been lost. Revilio addresses this task by detecting headers, generating an initial table sketch using a large language model, and using that sketch as a guiding representation during an enumerate-and-test strategy that evaluates syntactic and semantic table structures. We evaluate Revilio on a diverse set of datasets, demonstrating significant improvements over existing table parsing methods. Revilio outperforms traditional techniques in both accuracy and scalability, handling large tables with over 100,000 rows. Our experiments find an increase in reconstruction accuracy by 5.8-11.3% over both neural and symbolic baseline systems
Semantic parsing that translates natural language queries to SPARQL is of great importance for Knowledge Graph Question Answering (KGQA) systems. Although pre-trained language models like T5 have achieved significant success in the Text-to-SPARQL task, their generated outputs still exhibit notable errors specific to the SPARQL language, such as triplet flips. To address this challenge and further improve the performance, we propose an additional pre-training stage with a new objective, Triplet Order Correction (TOC), along with the commonly used Masked Language Modeling (MLM), to collectively enhance the model's sensitivity to triplet order and SPARQL syntax. We also propose to verbalize the Internationalized Resource Identifiers (IRIs) during training. Our method achieves state-of-the-art performances on three widely-used benchmarks.
In recommendation systems, there has been a growth in the number of recommendable items (# of movies, music, products). When the set of recommendable items is large, training and evaluation of item recommendation models becomes computationally expensive. To lower this cost, it has become common to sample negative items. However, the recommendation quality can suffer from biases introduced by traditional negative sampling mechanisms.
In this work, we demonstrate the benefits from correcting the bias introduced by sampling of negatives. We first provide sampled batch version of the well-studied WARP and LambdaRank methods. Then, we present how these methods can benefit from improved ranking estimates. Finally, we evaluate the recommendation quality as a result of correcting rank estimates and demonstrate that WARP and LambdaRank can be learned efficiently with negative sampling and our proposed correction technique.
Traffic forecasting, crucial for urban planning, requires accurate predictions of spatial-temporal traffic patterns across urban areas. Existing research mainly focuses on designing complex spatial-temporal models to capture these dependencies. However, this field faces challenges related to data scarcity and model stability, which results in limited performance improvement. To address these issues, we propose Spatial-Temporal Masked AutoEncoders (STMAE), a plug-and-play framework designed to enhance existing spatial-temporal models on traffic prediction. STMAE operates in two stages. In the pretraining stage, an encoder processes partially visible traffic data produced by a dual-masking strategy, including biased random walk-based spatial masking and patch-based temporal masking. Subsequently, two decoders aim to reconstruct the masked counterparts from both spatial and temporal perspectives. The fine-tuning stage retains the pretrained encoder and integrates it with decoders from existing backbones to improve traffic forecasting accuracy. Our results on traffic benchmarks show that STMAE can largely enhance the forecasting capabilities of various spatial-temporal models.
Traffic Signal Control(TSC), a pivotal and challenging research area in the transportation domain, aims to alleviate congestion at urban intersections by optimizing vehicular flows from different inflow directions. While large efforts have been focused on using Reinforcement Learning(RL) based methods to tackle the TSC problem, it possesses constraints such as unpredictable training duration and risks of online exploration, limiting its real-world deployment. Recently, offline RL has emerged as a new solution by transitioning from learning through online interactions to deriving policies from pre-collected datasets, which guarantees a safer and more efficient learning process. However, existing offline methods overlook the crucial temporal and spatial intricacy among data from different traffic signals at different timesteps, which leads to suboptimal performance. To this end, in this paper, we present an innovative formulation of the offline TSC problem by introducing a spatio-temporal graph to model the historical Markov Decision Process sequences across all traffic signals within the road network. Along this line, we propose STLight, a novel spatio-temporal sequence modeling approach to predict optimal actions for the signals from historical data, accounting for the inherent inter-dependencies among them. Specifically, we incorporate a spatio-temporal encoder to represent states, actions, and returns by capturing dynamic and spatially dependent information. The ordered space-time-aware representations are further fed to the Action Decoder to predict signal phase actions in an auto-regressive manner, accounting for the hidden dependencies between the actions and the reward and state tokens. Furthermore, to adaptively handle tasks with different levels of congestion scenarios, we incorporate space-aware return-based contrastive learning to automatically differentiate data samples with disparate traffic flow patterns. Finally, extensive experiments conducted on two public real-world traffic datasets clearly demonstrate the superior performance of the proposed model over both the state-of-the-art online and offline traffic signal control baselines.
Survival analysis models time-to-event distributions with censorship. Recently, deep survival models using neural networks have dominated due to their representational power and state-of-the-art performance. However, their "black-box" nature hinders interpretability, which is crucial in real-world applications. In contrast, "white-box" tree-based survival models offer better interpretability but struggle to converge to global optima due to greedy expansion. In this paper, we bridge the gap between previous deep survival models and traditional tree-based survival models through deep rectified linear unit (ReLU) networks. We show that a deliberately constructed deep ReLU network (termed SurvReLU) can harness the interpretability of tree-based structures with the representational power of deep survival models. Empirical studies on both simulated and real survival benchmark datasets showed the effectiveness of the proposed SurvReLU in terms of performance and interoperability. The code is available at https://github.com/xs018/SurvReLU.
In this work we propose to adapt Learned Sparse Retrieval, an emerging approach in IR, to text-centric content-based recommendations, leveraging the strengths of transformer models for an efficient and interpretable user-item matching. We conduct extensive experiments, showing that our LSR-based recommender, dubbed STAR, outperforms existing dense bi-encoder baselines on three recommendation domains. The obtained word-level representations of users and items are easy to examine and result in over 10x more compact indexes.
In this work, we aim to understand the general public perception of societal issues related to the current climate crisis and the COVID-19 pandemic on Twitter (X). Social media discussions on such matters often lead to misleading information, resulting in delays in initiatives proposed by governments or policymakers. Hence, we focus on extracting relevant information from the conversations on climate change and COVID that could be useful for authorities to curb the spread of potentially biased information by proposing the classification tasks of relevance detection (RD) and information categorization (IC). We first curate the datasets for the RD and IC tasks for the climate domain and extend the COVID-19 benchmark attention-worthy Twitter dataset for the IC task through manual annotation. We initially conduct experiments with LLMs and observe that LLMs can extract the relevant information in zero and few-shot settings based on multi-perspective reasoning in the form of cognitive empathy and ethical standards, but still perform worse than fine-tuned small language models. Based on the initial findings, we conclude that LLMs may not be the best extractor of relevant information, but induce cognitive empathy and ethical reasonings that can intuitively guide supervised models. To achieve this idea, we develop a cognitive empathy and ethical reasoning-based multi-tasking pipelined network for RD and IC tasks. Our proposed approach provides valuable insights that could be useful in real-world scenarios for governments, policymakers, and other researchers to decode the overall public outlook on societal issues.
This paper presents our analysis of neural IR models, particularly focusing on over-penalization for extra information (OPEX) - a phenomenon where addition of a sentence to a document causes an unreasonable decline in the document rank. We found that neural IR models suffered from OPEX, especially when the added sentence is similar to the other sentences in the document. To mitigate OPEX, we propose to apply a window-based scoring approach that segments a document and aggregates scores of the segments to compute the overall document score. We theoretically proved that the window-based scoring approach fully suppressed OPEX in an extreme case where each segment contains only a single sentence, and empirically showed that this approach mitigated OPEX. The code is available at https://github.com/argonism/OPEX .
Online grooming is the process of an adult initiating a sexual relationship with a minor through online conversation platforms. While neural models are developed to detect such incidents, their practical implications in real-world settings remain moot for their closed, irreproducible, and poor evaluation methodologies under the sparse distribution of grooming conversations in the training datasets, like undermining recall over precision. Furthermore, proposed models overlook characteristic features of grooming in online conversations, including the number of participants, message exchange patterns, and temporal signals, such as the elapsed times between messages. In this paper, we foremost contribute Osprey, an open-source library to support a standard pipeline and experimental details, incorporating canonical neural models and a variety of vector representation learning for conversations while accommodating new models and training datasets. Further, we incorporate conversation features into the models to improve recall while maintaining precision. Our experiments across neural baselines and vector representations of conversations demonstrated that recurrent neural models, particularly gru, on the sequence of pretrained transformer-based embeddings of messages in a conversation along with conversation features obtain state-of-the-art performance, winning the best recall with competitive precision. Osprey is available at https://github.com/fani-lab/Osprey/tree/cikm24.
In the field of ancient Chinese text, extracting and analysing temporal and geographic information are crucial for understanding the personal experiences of historical figures, the development of historical events, and the overall historical background. Currently, named entity recognition(NER) strategies such as BERT+CRF are used to extract temporal and geographic information from ancient Chinese text. However, ancient Chinese text covers a vast time span, and the temporal and geographic entities constantly evolve and change, making it difficult to extract these entities from text. This paper proposes a temporal and geographic extraction model for ancient Chinese text, enhanced by time-series external knowledge base. The extraction of proprietary nouns and general structures are divided into two independent networks. An external database is applied to enhance extraction of proprietary nouns and reduce noise for general structure inference. We constructed address trees and chronological tables containing commonly used places and time-related keywords from different periods and collected 12,000 texts spanning 3,000 years for extensive training. Overall, our research highlights the importance of external knowledge base for ancient Chinese NER, and provides new ideas for research in related fields.
Logo embedding models convert the product logos in images into vectors, enabling their utilization for logo recognition and detection within e-commerce platforms. This facilitates the enforcement of intellectual property rights and enhances product search capabilities. However, current methods treat logo embedding as a purely visual problem. A noteworthy issue is that visual models capture features more than logos. Instead, we view this as a multimodal task, using text as auxiliary information to facilitate the visual model's understanding of the logo. The emerging Multimodal Large Language Models (MLLMs) have demonstrated remarkable capabilities in both visual and textual understanding. Inspired by this, we propose an approach, FashionLOGO, to explore how to prompt MLLMs to generate appropriate text for product images, which can help visual models achieve better logo embeddings. We adopt a cross-attention transformer block that enables visual embedding to automatically learn supplementary knowledge from textual embedding. Our extensive experiments on real-world datasets prove that FashionLOGO is capable of generating generic and robust logo embeddings, achieving state-of-the-art performance in all benchmarks.
Large language models encapsulate knowledge and have demonstrated superior performance on various natural language processing tasks. Recent studies have localized this knowledge to specific model parameters, such as the MLP weights in intermediate layers. This study investigates the differences between entity and relational knowledge through knowledge editing. Our findings reveal that entity and relational knowledge cannot be directly transferred or mapped to each other. This result is unexpected, as logically, modifying the entity or the relation within the same knowledge triplet should yield equivalent outcomes. To further elucidate the differences between entity and relational knowledge, we employ causal analysis to investigate how relational knowledge is stored in pre-trained models. Contrary to prior research suggesting that knowledge is stored in MLP weights, our experiments demonstrate that relational knowledge is also significantly encoded in attention modules. This insight highlights the multifaceted nature of knowledge storage in language models, underscoring the complexity of manipulating specific types of knowledge within these models.
Advanced deep learning-based face recognition models require extensive datasets for optimal performance. However, increasing privacy concerns drive the limitation of face image access on devices to prevent personal information leaks. To address this, federated learning, which allows decentralized data collaboration, has gained popularity. However, traditional federated learning methods risk privacy by transmitting identity proxies to servers. We propose DP-FedFace, a privacy framework specifically designed for a realistic scenario where each client contains only the owner's face images (one identity per client). It uses the difference between human and model perception to eliminate visualization-critical low-frequency components, thus protecting user privacy. We also introduce a novel, learnable privacy cost allocation mechanism that optimizes allocation strategies and adds noise to frequency domain features. Extensive experiments demonstrate that DP-FedFace maintains high recognition accuracy while offers robust privacy protection.
Temporal Knowledge Graph Forecasting (TKGF) aims to forecast the missing entities or relations at a specific timestamp when only the historical information is observed. It is crucial to accurately identify the historical information of complex temporal relational graphs related to the query. Existing works, e.g., TANGO, have exploited the Neural Ordinary Differential Equation (NODE) to TKGF. However, TANGO encounters two limitations. First, TANGO observes historical facts with only one timestamp at each step, leading to a long-term forgetting problem. Second, TANGO gives the same weight to the entire history graph, including facts that are not relevant to the query. To tackle the above limitations, this paper utilizes Attentional Neural Integral Equation for TKGF (tIE), enabling the global interaction between query-related historical graph sequences. To achieve this, we employ the Relational Graph Convolutional Network and Fourier-type Transformer to model the graph structure and temporal evolution of TKG. The Iterative Integral Equation Solver is exploited to enhance the accuracy and robustness of numerical solutions. The proposed method outperforms baseline models regarding several metrics and inference speed on four benchmark datasets, especially on the long horizontal link forecasting task with irregular time intervals.
Malevolence detection in dialogues aims to identify harmful or inappropriate utterances, significantly impacting dialogue quality and user satisfaction. Although existing studies have shown promising performance by modeling interaction patterns from dialogue history, various malevolence-invoking factors, such as fine-grained emotions, evolving topics and user profiles, are often overlooked. To comprehensively consider these factors, we propose a hypergraph fusion model by employing multi-view LLM-driven prompts for malevolence detection in dialogues. Our model integrates emotion context, topic context, user profile context and interaction context, utilizing hypergraphs to establish high-order contextual relationships from multi views for deducing malevolence-invoking semantics. Experimental results on two benchmark datasets demonstrate that our model achieves the state-of-the-art performance.
Knowledge Graph (KG) has proven its effectiveness in recommendation systems. Recent knowledge-aware recommendation methods, which utilize graph neural networks and contrastive learning, underestimate two issues: 1) The neglect of modeling the latent relationships between users and entities; 2) The insufficiency of traditional cross-view contrastive learning whose domain is incapable of covering all nodes in a graph. To address these issues, we propose a novel model named Knowledge-aware User Preference Network (KUPN). Specifically, KUPN first constructs the relational preference view containing a new graph named User Preference Graph (UPG) to model the potential relationships between users and entities. Then, we adopt a novel attentive information aggregation to learn the UPG. In addition, we obtain semantic information of users and entities from collaborative knowledge view which consists of KG and Interaction Graph (IG) as supplementary. Finally, we apply a cross-view contrastive learning for complete domains between dynamic relational preference view and collaborative knowledge view. Extensive experiments on three real-world datasets demonstrate the superiority of KUPN against the state-of-the-art methods.
Bloom filter (BF) encodings are a proven method for comparing the similarity of records from multiple databases while maintaining privacy. This process, known as privacy-preserving record linkage (PPRL), is computationally expensive, especially with large datasets. To address this challenge, we have observed that BF encodings often exhibit sparsely filled patterns. Leveraging this insight, we introduce SparseBF, a scalable data structure that is space-optimized and maintains fast computation speed for PPRL. Compared to typical BF encodings, SparseBF brings in three improvements. First, SparseBF employs a hybrid storage scheme that selects the optimal storage component for BF encodings. Second, SparseBF utilizes its adaptive compressed sparse row storage to achieve lossless space compression, both when BFs are sparsely occupied and when they are densely populated. Third, SparseBF supports SIMD vector instructions to optimize the record linkage speed. Experiments show that, in sparsely filled scenarios, SparseBF outperforms the existing solution by up to 2.1x record linkage computation speed, while delivering up to 70.5% savings in storage space.
Current studies mainly rely on overlapping users (who leave trajectories in both cities) as a medium to learn travelers' preference in the target city, however it is unrealistic to find overlapping users when two cities are far apart, thus a severe data scarcity issue exists for this problem. Besides, due to the mixture of mobility pattern from both cities, directly applying the model trained in the source city may lead to negative transfer in the target city. To tackle these issues, in this paper, we conceive and implement a novel framework called CrossPred to predict the cross-city mobility of long-distance travelers in the target city. Specifically, POI features including popularity, textual description, spatial distribution as well as sequential pattern are considered for cross-city POI matching, which further acts as a vital link for jointly modeling native user mobility preference in both source and target cities. Maximum Mean Discrepancy (MMD) is adopted to strengthen the shared POI features among cities and weaken the unique POI features, thereby promoting cross-city POI feature matching. Extensive experiments on real-world datasets demonstrate the effectiveness and superiority of the proposed framework.
In real-world applications, users express different behaviors when they interact with different items, including implicit click/like interactions, and explicit comments/reviews interactions. Nevertheless, almost all recommender works are focused on how to describe user preferences by the implicit click/like interactions, to find the synergy of people. For the content-based explicit comments/reviews interactions, some works attempt to utilize them to mine the semantic knowledge to enhance recommender models. However, they still neglect the following two points: (1) The content semantic is a universal world knowledge; how do we extract the multi-aspect semantic information to empower different domains? (2) The user/item ID feature is a fundamental element for recommender models; how do we align the ID and content semantic feature space? In this paper, we propose a 'plugin' semantic knowledge transferring method LoID, which includes two major components: (1) LoRA-based large language model pretraining to extract multi-aspect semantic information; (2) ID-based contrastive objective to align their feature spaces. We conduct extensive experiments with SOTA baselines to demonstrate superiority of our method LoID.
The need for explainability in time-series classification models has been increasing. Counterfactual explanations recommend how to modify the features of an original instance so that the prediction by a given classifier flips to the desired class. Since features in the time series are temporally dependent, interpretability is improved by considering intervals where the counterfactual can deviate from the original instance. In this study, we propose a model-agnostic counterfactual generation method (CEI) that jointly learns these intervals and the counterfactual. Furthermore, CEI can generate a counterfactual tailored to the directly specified limited number of intervals. We mathematically formulate CEI as a continuous optimization and demonstrate its effectiveness on the UCR datasets.
In human reading and communication, individuals tend to engage in geospatial reasoning, which involves recognizing geographic entities and making informed inferences about their interrelationships. To mimic such cognitive process, current methods either utilize conventional natural language understanding toolkits, or directly apply models pretrained on geo-related natural language corpora. However, these methods face two significant challenges: i) they do not generalize well to unseen geospatial scenarios, and ii) they overlook the importance of integrating geospatial context from geographical databases with linguistic information from the Internet. To handle these challenges, we propose GeoReasoner, a language model capable of reasoning on geospatially grounded natural language. Specifically, it first leverages Large Language Models (LLMs) to generate a comprehensive location description based on linguistic and geospatial information. It also encodes direction and distance information into spatial embedding via treating them as pseudo-sentences. Consequently, the model is trained on both anchor-level and neighbor-level inputs to learn geo-entity representation. Extensive experimental results demonstrate GeoReasone's superiority in three tasks: toponym recognition, toponym linking, and geo-entity typing, compared to the state-of-the-art baselines.
Icons are frequently employed in children-oriented information systems due to children's limited literacy. However, the inherent semantic distances of icons, which may influence their affordance to children, are often overlooked in the development of such systems and related research. In this study, we apply semantic distance to measure the explicitness of icons in children-oriented book search, utilizing self-developed icons tailored for indexing picture books. We first gathered data from children through questionnaires to assess the perceived semantic distance of each icon. Subsequently, we conducted eye-tracking experiments with 50 preschool children, measuring their search accuracy, response time, and eye movement patterns while using icons to locate specific picture books. Our findings indicate that preschool children are easier to use icons with close semantic distance and single icons for searching. Additionally, the ability to use icons with distant semantic distances and combination icons significantly improves with age. These findings may contribute to the development of more effective and children-friendly information search systems.
Encrypted traffic classification is essential for network security and management. However, the encrypted nature makes it challenging to extract representative features from raw traffic data. Existing end-to-end methods ignore byte correlations within packets and potential correlations among packets, hindering the learning of real traffic semantics and leading to suboptimal performance. This paper proposes MsETC, a multi-scale contrastive attention representation learning method for encrypted traffic classification. MsETC divides the raw packet byte sequence into multi-scale patches and then extracts dual views for contrastive learning from both the inter-patch and intra-patch perspectives. This allows the model to capture correlations among bytes within a packet as well as the potential interactions between packets. Extensive experiments on real-world datasets demonstrate that the proposed method achieves superior classification performance with lower complexity.
Recent research on the robustness of Graph Neural Networks (GNNs) under noises or attacks has attracted great attention due to its importance in real-world applications. Most previous methods explore a single noise source, recovering corrupt node embedding by reliable structures bias or developing structure learning with reliable node features. However, the noises and attacks may come from both structures and features in graphs, making the graph denoising a dilemma and challenging problem. In this paper, we develop a unified graph denoising (UGD) framework to unravel the deadlock between structure and feature denoising. Specifically, a high-order neighborhood proximity evaluation method is proposed to recognize noisy edges, considering features may be perturbed simultaneously. Moreover, we propose to refine noisy features with reconstruction based on a graph auto-encoder. An iterative updating algorithm is further designed to optimize the framework and acquire a clean graph, thus enabling robust graph learning for downstream tasks. Our UGD framework is self-supervised and can be easily implemented as a plug-and-play module. We carry out extensive experiments, which proves the effectiveness and advantages of our method. Code is avalaible at https://github.com/YoungTimmy/UGD.
Joint Multimodal Entity-Relation Extraction (JMERE) aims to extract entity-relationship triples in texts from given image-text pairs. As a joint multimodal information extraction task, it has attracted increasing research interest. Previous works of JMERE typically utilize graph networks to align textual entities and visual objects and achieve promising performance. However, these methods do not pay attention to the inconsistency between text and image and the straight alignment could limit the performance of JMERE models. In this paper, we propose a Consistency-adaptive text-image Alignment Generation (CAG) framework for various text-image consistency scenarios. Specifically, we propose a Consistency Factor (CF) to measure the consistency between images and texts. We also design consistency-adaptive contrastive learning based on CF, which can reduce the impact of inconsistent visual and textual information. Additionally, we adopt JMERE-specifical instruction tuning for better entity-relationship triplet generation. Experimental results on the JMERE dataset demonstrate that our proposed CAG is effective and achieves state-of-the-art performance.
In recommender systems, learning high-quality user and item representations is crucial for predicting user preferences. However, there are various confounding factors in observational data, resulting in data bias, which hinders the learning of user and item representations. Recent work proposed to use uniform data to alleviate bias problem. However, these methods fail to learn pure representations for unbiased prediction, which are not affected by confounding factors. This paper introduces a novel disentangled framework, named CDLRec, for learning unbiased representations, leveraging uniform data as supervisory signal for disentangling. Furthermore, to address the scarcity problem of uniform data, the contrastive learning is utilized to implement disentanglement by providing augmented samples. Specifically, two contrastive strategies are designed based on different sampling ways for positives and negatives. Extensive experiments are conducted over two real-world datasets and the results demonstrate the superior performance of our proposed method.
Graphs serve as fundamental representations for a diverse array of complex systems, capturing intricate relationships and interactions between entities. In many real-world scenarios, graphs exhibit non-homophilous, or heterophilous, characteristics, challenging traditional graph analysis methods rooted in homophily assumptions. Recent heterophilous methods frequently struggle with noise in node attributes, which can degrade the quality of graph representations and affect downstream task performance. Common graph augmentations, while useful, often introduce bias and irrelevant noise. This paper proposes a novel method, Robust Heterophily Graph Learning via Uniformity Augmentation (RHGL-UA), which incorporates uniformity in the augmentation process through controlled random perturbations. This approach ensures a more uniform distribution of representations across different layers of the model. By adapting to data variations and learning more diverse information, RHGL-UA significantly improves performance on downstream tasks and stands out as the first practical robust heterophily graph method using representation augmentation with a theoretical guarantee. Extensive experiments demonstrate the merit of our proposed method.
High-quality multimodal training data is of critical importance for improving of multimodal model performance. However, the utilization of web-crawled vision-caption pairs is hindered by the presence of noise and irrelevance, as well as a lack of Chinese data. Large Language Models (LLM) and Large Multimodal Models (LMM) has demonstrated promising performance in cross-modal understanding and generation. In light of this, we propose a Chinese visual captioning pipeline for the synthesis of high-quality data. Our pipeline is comprised of two phases: the initial training of an encoder for visual understanding; and the subsequent fine-tuning of a captioning model in a two-stage iterative human-in-the-loop process, where the captioning model incorporates the pre-trained vision encoder and LLM by a visual cross-attention querying transformer. Extensive experiments have been conducted to validate our framework, including both quantitative and qualitative evaluation of captions generated from images and videos. The synthesis pipeline has been integrated into the ad image creative generation process in Baidu Search Ads, resulting in enhanced capabilities in prompt following.
In this paper, we introduce a novel BART-based Hierarchical Attentional Ordering Network (BHAONet), aiming to address the coherence modeling challenge within paragraphs, which stands as a cornerstone in comprehension, generation, and reasoning tasks. By leveraging the pre-trained BART model to encode the entire sequence, we can effectively exploit global semantic and contextual information. Moreover, the token-level and sentence-level hierarchical attentional layers are incorporated to encourage the model to focus on features at various levels of granularity. In addition, a transformer-guided pointer network is developed for decoding. Extensive experiments conducted on benchmark datasets demonstrate the effectiveness and superiority of our proposed model.
Chinese Spelling Correction is to detect errors in a statement and correct them. Researchers have recently begun to investigate the use of pre-train language models to perform this task. They replace tokens with similar characters based on a confusion set rather than mask tokens when pre-training. A significant limitation of this method is that pre-training with single tokens cannot effectively simulate natural errors, such as full spelling. Furthermore, the confusion set consists of single characters, which lacks the contextual relationship between characters. This leads to a lower quality of generated pseudo-labeled samples. To this end, we contribute a novel approach of constructing confusion corpus, which can automatically generate high-quality spelling errors. The construction of our confusion set is based on user behavior patterns, aiming to identify characters that are easily confused in similar contexts. Then we introduce a Span Confusion Pre-Train(SCPre) strategy for Chinese Spelling Correction. The span confusion strategy replaces characters, words, and phrases in the text according to a large confusion set. Upon the constructed corpus, different models are trained and evaluated for CSC with respect to real-world datasets. Quantitative experiments on datasets show that our approach empirically achieves unprecedented performance.
The growing popularity of text-as-data in various domain-specific applications and research has often relied on manually selected keywords or annotations. Although labor-intensive, expensive and time-consuming, the effectiveness of these efforts is not always guaranteed, especially in the early stages of research. This predicament raises the question of the extent to which large language models (LLMs) can aid in verifying the potential of a nascent research idea. This paper seeks to explore the reliability of LLM-suggested keywords in the automatic construction of the Economic Policy Uncertainty (EPU) index. Our findings confirm that LLMs can effectively automate the construction of EPU index. Furthermore, we delve into the potential of LLMs in enhancing the indicator construction process.
Real-time data processing systems generate huge amounts of data that need to be classified. The volume, variety, velocity, and veracity (uncertainty) of this data necessitate new approaches and the adaptation of existing classification methods. Moreover, the arriving data can belong to more than one class at the same time. As the number of labels grows larger, a significant portion of the multi-label data stream classification methods become computationally inefficient. We propose a novel online approach: the Prioritized Binary Transformation (PBT) method, which can classify data with large numbers of labels by ordering the labels using Principal Component Analysis (PCA) within a fixed-size window. This order is then used to transform the label vectors for classification. We perform an empirical analysis on 12 datasets and compare PBT to four prominent baselines using four evaluation metrics. PBT achieves the best average ranking in three of the four evaluation metrics. Moreover, we investigate efficiency under average execution time per data item and memory consumption where PBT achieves second and first average rankings, respectively.
Customers reach out to online live chat agents with various intents, such as asking about product details or requesting a return. In this paper, we propose the problem of predicting user intent from browsing history and address it through a two-stage approach. The first stage classifies a user's browsing history into high-level intent categories. Here, we represent each browsing history as a text sequence of page attributes and use the ground-truth class labels to fine-tune pretrained Transformers. The second stage provides a large language model (LLM) with the browsing history and predicted intent class to generate fine-grained intents. For automatic evaluation, we use a separate LLM to judge the similarity between generated and ground-truth intents, which closely aligns with human judgments. Our two-stage approach yields significant performance gains compared to generating intents without the classification stage.
Query understanding is an essential part in search systems to improve the recall. Unlike prior works focusing on word expansions, in this paper, we leverage the comprehension ability of LLM to generate detailed queries from a global semantic perspective. To this end, we introduce an efficient GaQR to reformulate a question into several queries using Chain of Thought (CoT) and make it more efficient through knowledge distillation. Specifically, we first prompt a teacher model to generate indicative queries by considering answer generation one step ahead. Then, we filter out low-quality queries by validating the effectiveness of all generated queries in retrieving useful passages. Finally, we distill a student rewriter based on the verified results to improve efficiency. Our experimental results demonstrate that the rewriter improves the retrieval performance by 3% to 15% on the Miracl and NFCorpus datasets and shows good generalisation ability across different retrieval methods. Moreover, the efficiency of the rewriter after knowledge distillation is improved by as much as 5 times. Code is available at https://github.com/youngbeauty250/GaQR.
In material science, the properties of crystalline materials largely depend on their structures, and space group is a key descriptor of crystal structure. With the rapid advancement of deep learning, the traditional artificial structure analysis method based on X-ray diffraction (XRD) has become cumbersome and is being gradually supplanted by neural networks. However, existing models are too simplistic and lack a comprehensive understanding of material structure. Our approach XRDMamba integrates chemical knowledge and presents a fresh crystal planes perspective on XRD data. We also introduce a knowledge-driven model for space group identification tasks. We have thoroughly analyzed our approach through numerous experiments, observing its SOTA performance and excellent generalization capabilities. The code is available in ~https://github.com/baigeiguai/XRDMamba.
Previous user-item interaction graphs have typically focused on simple interaction between users and items, failing to identify the important effects of user's intents in the interaction. While recent studies have ventured into exploring intent relationships between users and items for modeling, they predominantly emphasize user preferences manifesting in the interaction, overlooking knowledge-driven insight, thereby limiting the interpretability of intent. In this paper, we utilize the rich interpretable knowledge information in the knowledge graph to design a novel dual-level intents modeling framework called DIM. DIM aims to mine user's true intents, which usually include user popularity preference and personalized preference. Therefore, we extract both the popular and personalized user preferences from attribute tuples within the knowledge graph at the global and local levels, respectively. Experimental results on three datasets demonstrate the superiority of DIM over various state-of-the-art approaches.
Large language models (LLMs) possess powerful contextual comprehension capabilities and have demonstrated remarkable success in conversational tasks. However, existing works that apply LLMs to conversational text-to-SQL task have the problem of repetitive mistakes, which results in the failure to bring out the performance of LLMs. In this paper, we propose a novel approach that provides guidance through learning from mistakes. Specifically, the guidance offered by our approach includes tailored suggestions, corrective feedback, and personalized strategies aimed at improving learning outcomes. Furthermore, we employ chain-of-thought (CoT) to utilize guidance that is not suitable directly as prompts. Our method rigorously analyzes actual errors and strategizes on how to utilize the derived guidance effectively. Experimental results demonstrate that our approach improves the state-of-the-art (SOTA) performance metrics, increasing QEX performance from 66.3% to 70.9% (an absolute improvement of 4.6%) and IEX performance from 37.4% to 45.1% (an absolute improvement of 7.7%) on the CoSQL dataset.
Lower-dimensional temporal knowledge graph embedding (TKGE) models are crucial for practical applications and resource-limited scenarios, although existing models employ higher-dimensional embeddings in training. In this paper, we propose a new framework for distilling TKGE models via an easy to hard pedagogical principle. The framework utilizes a learnable curriculum temperature (CT) module to optimize and guide the knowledge distillation process dynamically, ensuring that the entire procedure adheres to the principle. It also employs a self-adaptive attention mechanism to endeavor to achieve efficient transfer of knowledge from higher-dimensional models to lower-dimensional ones. Evaluation on various TKGE models and datasets demonstrates the proposed approach significantly reduces the model's parameters without noticeably affecting its performance.
In this paper, we propose a new Momentum Contrastive Bidirectional Encoding network with S elf-D istillation (MoCoBE-SD) to alleviate the data sparsity and noise issues in sequential recommendation by providing rich informative supervisions from both sequence-level and item-level perspectives. In particular, a Momentum Contrastive Bidirectional Encoding (MoCoBE) network is first proposed by constructing momentum updated encoder based on an online bidirectional self-attention encoder, where a momentum contrastive learning task and a masked item prediction task are simultaneously optimized. Building upon MoCoBE, a well-elaborated Self-Distillation (SD) scheme is incorporated to further suppress the noise influence. Specifically, a well-trained sequence encoder by MoCoBE is adopted as the teacher encoder to provide refined supervision for the masked item prediction, which constitutes our MoCoBE-SD framework. Extensive experiments on three public datasets show that MoCoBE-SD outperforms the existing state-of-the-art methods consistently.
Multi-label few-shot image recognition aims to identify multiple unseen objects using only a handful of examples. Recent methods typically tune pre-trained vision-language models with shared or class-specific prompts. However, they still have drawbacks. Tuning a shared prompt is insufficient for all samples especially when the tasks are complex and tuning specific prompts for each class is inevitable to lose generalization ability, thus failing to capture diverse visual knowledge. To address these issues, we propose to meta-tune a generalized prompt pool, enabling each prompt to act as an expert for multi-label few-shot image recognition. Specifically, we first construct a diverse prompt pool to handle complex samples and tasks effectively. Then, the meta-tuning strategy is designed to learn meta-knowledge and transfer it from source tasks to target tasks, enhancing the generalization of prompts. Extensive experimental results on two widely used multi-label image recognition datasets demonstrate the effectiveness of our method.
Temporal Knowledge Graph (TKG) reasoning is a crucial task that aims to predict future facts based on historical information. In the process of reasoning over TKGs, we identify two types of facts that need to be predicted: 1) recurring facts and 2) unknown facts. While existing models emphasize reasoning about recurring facts, they inadvertently overlook the importance of unknown facts. To make better predictions on both facts, we introduce a novel TKG reasoning model, named Multi-view Recurrent Network (MV-NET), which generates different views to capture reasoning patterns for both recurring and unknown facts. Specifically, MV-NET comprises three views: a recurring history view that captures repetitive features, an exploring history view that focuses on exploring new information for unknown facts, and a full history view that assimilates historical information comprehensively. Then, the historical information of each view is encoded by a multi-view recurrent network. To better integrate the embeddings of three views, we employ an adaptive scoring module, which consists of a query-aware attentive fusion mechanism to incorporate the predicted scores from three views, thus obtaining fused scores for prediction. Extensive experiments on three commonly used datasets demonstrate the superiority of MV-NET compared to many state-of-the-art baselines.
In the information retrieval (IR) area, dense retrieval (DR) models use deep learning techniques to encode queries and passages into embedding space to compute their semantic relations. It is important for DR models to balance both efficiency and effectiveness. Pre-trained language models (PLMs), especially Transformer-based PLMs, have been proven to be effective encoders of DR models. However, the self-attention component in Transformer-based PLM results in a computational complexity that grows quadratically with sequence length, and thus exhibits a slow inference speed for long-text retrieval. Some recently proposed non-Transformer PLMs, especially the Mamba architecture PLMs, have demonstrated not only comparable effectiveness to Transformer-based PLMs on generative language tasks but also better efficiency due to linear time scaling in sequence length. This paper implements the Mamba Retriever to explore whether Mamba can serve as an effective and efficient encoder of DR model for IR tasks. We fine-tune the Mamba Retriever on the classic short-text MS MARCO passage ranking dataset and the long-text LoCoV0 dataset. Experimental results show that (1) on the MS MARCO passage ranking dataset and BEIR, the Mamba Retriever achieves comparable or better effectiveness compared to Transformer-based retrieval models, and the effectiveness grows with the size of the Mamba model; (2) on the long-text LoCoV0 dataset, the Mamba Retriever can extend to longer text length than its pre-trained length after fine-tuning on retrieval task, and it has comparable or better effectiveness compared to other long-text retrieval models; (3) the Mamba Retriever has superior inference speed for long-text retrieval. In conclusion, Mamba Retriever is both effective and efficient, making it a practical model, especially for long-text retrieval.
With the rapid development of social media, the wide dissemination of fake news on social media is increasingly threatening both individuals and society. One of the unique challenges for fake news detection on social media is how to detect fake news on future events. Recently, numerous fake news detection models that utilize textual information and the propagation structure of posts have been proposed. Unfortunately, most of the existing approaches can hardly handle this challenge since they rely heavily on event-specific features for prediction and cannot generalize to unseen events. To address this, we introduce Future AD aptive Event-based Fake news Detection (FADE) framework. Specifically, we train a target predictor through an adaptive augmentation strategy and graph contrastive learning to obtain higher-quality features and make more accurate overall predictions. Simultaneously, we independently train an event-only predictor to obtain biased predictions. We further mitigate event bias by subtracting the event-only predictor's output from the target predictor's output to obtain the final prediction. Encouraging results from experiments designed to emulate real-world social media conditions validate the effectiveness of our method in comparison to existing state-of-the-art approaches.
P-Rank (Penetrating-Rank) is a charming measure of structural similarity between objects based on graph topology. It recursively follows the principle that "two objects are considered similar if (a) they are referenced by similar objects and (b) they reference similar objects''. The best-known algorithm for computing P-Rank employs two repeated Singular Value Decompositions (SVDs) coupled with the Woodbury matrix identity. However, this method does not scale well on billion-sized graphs. Worse yet, this algorithm only provides a linear approximation of the P-Rank model and cannot deliver accurate P-Rank values. In this paper, we propose P-Rank+, a fast and efficient algorithm for computing P-Rank similarities, which scales well on large graphs with billions of edges. P-Rank+ leverages dimensionality reduction techniques by performing only one SVD of the graph integrated with Hadamard products in the reduced subspace. Moreover, we provide provable error guarantees for P-Rank+ computation. Experiments on various datasets validate that P-Rank+ is 1--3 orders of magnitude faster than the best-known competitor while achieving excellent scalability on massive graphs.
Accurately predicting Drug-Drug Interactions (DDIs) is critical to designing effective drug combination therapies. Recently, Artificial Intelligence (AI)-powered DDI prediction approaches have emerged as a new paradigm. However, most existing methods oversimplify the complex hierarchical structure within molecules and overlook the multi-source heterogeneous information external to molecules, limiting their modeling and predictive capabilities. To address this, we propose a <u>H</u> ierarchical <u>H</u> eterogeneous graph learning framework for <u>D</u> DI prediction, namely H2D. H2D employs an internal-to-external, local-to-global hierarchical perspective, exploiting intra-molecular multi-granularity structures and inter-molecular biomedical interactions to mutually enhance across hierarchical levels. Extensive experimental results demonstrate H2D's effectiveness on three real-world DDI prediction tasks (binary-class, multi-class, and multi-label). In sum, H2D achieves state-of-the-art performance in DDI prediction by leveraging the multi-scale graph structures, opening up new avenues in AI-powered DDI prediction.
Sequential recommender systems offer personalized suggestions by modeling users' interactions chronologically to capture dynamic user interest. Existing approaches typically fail to adequately describe the dynamics of the entire recommender system, including shifts in both user interest and item availability. To address this, we propose a simple yet effective framework with three key perspectives, tailored to the dynamics of recommender system by fully exploiting the time information. Firstly, we propose a dynamic candidate set construction approach to prevent the model from learning future interactions. Secondly, assuming that user behaviors remain consistent over short terms but may evolve over long terms, we employ a interval-weighted optimization target to model the correlation of users' historical interactions. Finally, we introduce a specialized time-aware attention module to enhance recommendations within specific temporal contexts. Extensive experiments demonstrate the effectiveness and generalizability of our framework. We make our codes publicly available.
Embedding-based Retrieval (EBR) has been a fundamental component in sponsored-search systems, which retrieves high-quality products for the user's search query by encoding the information of the query, user and product into dense embeddings. However, due to the characteristic of location-based service, the user input queries suffer from two extremes: overly brief queries with vague intentions and lengthy queries with substantial noise, both of which make it challenging to discern the exact user search intent. In fact, the e-consumers typically have a mental imagery of the product they intend to search for, reflecting their specific purchasing intentions. In this paper, we propose a Visual Imagination Enhanced Retrieval model (VIER) to explore the implicit imagery of users. Specifically, we design a visual imagination network to reconstruct the imagery embeddings that capture both coarse-grained query commonalities and fine-grained user personalities. These pseudo-image representations are integrated with the query and user behavior to enhance the understanding of user search intentions for improved retrieval. According to online A/B tests on Meituan sponsored-search system, our method significantly outperforms baselines in terms of revenue, clicks and click-through rate.
Current answer sentence selection (AS2) applied in open-domain question answering (ODQA) selects answers by ranking a large set of candidates, i.e., sentences, extracted from the retrieved text. In this paper, we present Passage-based Extracting Answer Sentence In-place (PEASI), a novel answer selection model optimized for Web-scale setting. This is a Transformer-based network that can jointly (i) rerank passages retrieved for a question and (ii) identify a probable answer from the top passages. We train PEASI with multi-task learning for sharing representations between the passage reranker and answer sentence extractor. We construct a new large-scale QA dataset (WQA) consisting of 800,000+ labeled passages/sentences for 60,000+ questions. The experiment results show that PEASI outperforms AS2 state of the art by 6.51% in accuracy on WQA, from 48.86% to 55.37%.
Finally, PEASI is efficient in computing answer sentences, requiring only ~20% of the inferences compared to the standard point-wise setting, which ranks all candidates. We will release WQA and PEASI implementation as we believe that they can contribute to advance the research in QA services at Web scale.
Data analytics applications today often require processing heterogeneous data from different data models, including relational, graph, and text data, for more holistic analytics. While query optimization for single data models, especially relational data, has been studied for decades, there is surprisingly little work on query optimization for cross-model data analytics. Cross-model query optimization can benefit from the long line of prior work in query optimization in the relational realm, wherein cost-based and/or machine learning-based (ML-based) optimizers are common. Both approaches require a large and diverse set of query workloads to measure, tune, and evaluate a query optimizer. To the best of our knowledge, there are still no large public cross-model benchmark workloads, a significant obstacle for systems researchers in this space. In this paper, we take a step toward filling this research gap by generating new query workloads spanning relational and graph data, which are ubiquitous in analytics applications. Our approach leverages large language models (LLMs) via different prompting strategies to generate queries and proposes new rule-based post-processing methods to ensure query correctness. We evaluate the pros and cons of each strategy and perform an in-depth analysis by categorizing the syntactic and semantic errors of the generated queries. So far, we have produced over 4000 correct cross-model queries, the largest set ever. Our code, prompts, data, and query workloads will all be released publicly.
Predicting multivariate time series has been a topic of interest among researchers for a long time, especially in hydrological prediction. Due to the presence of extreme events, hydrological prediction requires capturing long-range dependencies and modeling rare but significant extreme values. Accurate prediction of these dependencies is often accomplished using complex models, such as stacked RNNs or transformer-based models, which can be computationally expensive and challenging to train. In addition, existing studies have identified a strong correlation between streamflow and rainfall data. However, the use of additional input data in these studies has often been insufficient, resulting in predictions with low accuracy. In this paper, we address these issues and propose LSPM, a Long Short-term Polar-Learning time series forecasting Model. LSPM learns polar representations through a feature reuse method called EDDU (Encoder Double-Decoder Unit). EDDU creatively incorporates exogenous input to generate long-term predictions based on these learned representations. To maximize the use of indicator sequences from exogenous data, LSPM enhances short-term predictions by a carefully designed loss function and integrates them into the overall forecast, improving robustness to short-term severe events. Experiments on four real-life hydrologic streamflow datasets demonstrate that LSPM significantly outperforms both state-of-the-art hydrologic time series prediction methods and general methods designed for long-term time series prediction.
Although graph neural networks (GNNs) can extract the latent relationship-level knowledge among the graph nodes and have achieved excellent performance in unsupervised scenarios, it is weak in learning the instance-level knowledge in contrast to the convolution neural networks (CNNs). Besides, lacking of the graph structure limits the extension of GNNs on non-graph datasets. To solve these problems, we propose a novel unsupervised multi-level knowledge fusion network. It successfully unifies the instance-level and relationship-level knowledge on the non-graph data by distillation from a pre-trained CNN teacher to a GNN student. Meanwhile, a sparse weighted strategy is designed to adaptively extract the sparse graph topology and extend the GNN on non-graph datasets. By optimization of distillation loss, the "boosted'' GNN student can learn the multi-level knowledge and extract more discriminative deep embeddings for clustering. Finally, extensive experiments show it has achieved excellent performance compared with the current methods.
Anomalies in graphs involve attributes and structures and may occur at different levels (e.g., node or community). Existing GNN-based detection methods often merely focus on anomalies of single nodes or neighborhoods, making it hard to cope with complex and organized networks. Towards this, we propose SI-HGAD, a novel Graph Anomaly Detection (GAD) approach that utilizes hierarchical information to detect anomalies. Powered by structural information, SI-HGAD can mine an optimal graph abstraction while enabling hierarchical substructural modeling. Also, we design a Graph Transformer to mine multi-range structural and attribute patterns for nodes. The decoders reconstruct both the node attributes and the multi-level subgraphs in a bottom-up manner. Extensive experiments demonstrate the superiority of SI-HGAD.
Entity Recognition (ER) is a common natural language processing task encountered in a number of real-world applications. For common domains and named entities such as places and organisations, there exists sufficient high quality annotated data and foundational models such as T5 and GPT-3.5 also provide highly accurate predictions. However, for niche domains such as e-commerce and medicine with specialized entity types, there is a paucity of labeled data since manual labeling of tokens is often time-consuming and expensive, which makes entity recognition challenging for such domains. Recent works such as NEEDLE [48] propose hybrid solutions to efficiently combine a small amount of strongly labeled (human-annotated) with a large amount of weakly labeled (distant supervision) data to yield superior performance relative to supervised training. The extensive noise in the weakly labeled data, however, remains a challenge. In this paper, we propose WeSDoM (Weak Supervision with Domain Models), which leverages pretrained encoder models from the same domain but different tasks to create domain ontologies that can enable the creation of less noisy weakly labeled data. Experiments on internal e-commerce and public biomedical NER datasets demonstrate that WeSDoM outperforms existing SOTA baselines by a significant margin. We achieve new SOTA F1 scores on two popular Biomedical NER datasets, BC5CDR-chem 94.27, BC5CDR-disease 91.23.
In the wake of a fabricated explosion image at the Pentagon, an ability to discern real images from fake counterparts has never been more critical. Our study introduces a novel multi-modal approach to detect AI-generated images amidst the proliferation of new generation methods such as Diffusion models. Our method, UGAD, encompasses three key detection steps: First, we transform the RGB images into YCbCr channels and apply an Integral Radial Operation to emphasize salient radial features. Secondly, the Spatial Fourier Extraction operation is used for a spatial shift, utilizing a pre-trained deep learning network for optimal feature extraction. Finally, the deep neural network classification stage processes the data through dense layers using softmax for classification. Our approach significantly enhances the accuracy of differentiating between real and AI-generated images, as evidenced by a 12.64% increase in accuracy and 28.43% increase in AUC compared to existing state-of-the-art methods.
Retrieval-augmented generation (RAG) systems combine the strengths of language generation and information retrieval to power many real-world applications like chatbots. Use of RAG for understanding of videos is appealing but there are two critical limitations. One-time, upfront conversion of all content in large corpus of videos into text descriptions entails high processing times. Also, not all information in the rich video data is typically captured in the text descriptions. Since user queries are not known apriori, developing a system for video to text conversion and interactive querying of video data is challenging.
To address these limitations, we propose an incremental RAG system called iRAG, which augments RAG with a novel incremental workflow to enable interactive querying of a large corpus of videos. Unlike traditional RAG, iRAG quickly indexes large repositories of videos, and in the incremental workflow, it uses the index to opportunistically extract more details from select portions of the videos to retrieve context relevant to an interactive user query. Such an incremental workflow avoids long video to text conversion times, and overcomes information loss issues due to conversion of video to text, by doing on-demand query-specific extraction of details in video data. This ensures high quality of responses to interactive user queries that are often not known apriori. To the best of our knowledge, iRAG is the first system to augment RAG with an incremental workflow to support efficient interactive querying of a large corpus of videos. Experimental results on real-world datasets demonstrate 23x to 25x faster video to text ingestion, while ensuring that latency and quality of responses to interactive user queries is comparable to responses from a traditional RAG where all video data is converted to text upfront before any user querying.
Generating a set of keyphrases that convey the main concepts discussed in a document has been applied to improve various applications including document retrieval and online advertising. The state-of-the-art approaches mostly rely on the neural sequence-to-sequence framework to generate keyphrases. However, training such deep neural networks either requires a significant amount of human efforts in obtaining ground truth keyphrases or suffers from lower quality training data derived from weakly supervised signals. More recently, pre-trained language models are fine-tuned to build more data-efficient keyphrase generation models. Yet, the documents often need to be truncated to adapt to the pre-trained context window. On the other hand, large language models (LLMs) have demonstrated impressive abilities in understanding very long text and generating answers for a wide range of natural language processing tasks, making them great candidates for improving keyphrase generation. There however is a lack of a systematic study on how to use LLMs, especially in an industrial setting that requires low generation latency. In this work, we present an empirical study to facilitate a more informed use of LLMs for keyphrase generation. We compare zero-shot and few-shot in-context learning with parameter efficient fine-tuning using a number of open-source LLMs. We show that using only a handful of well selected human annotated samples, the LLMs already outperform the fine-tuned language model baselines. When thousands of human labeled samples are available, fine-tuned large language models significantly improve the amount and the quality of the generated keyphrases. To enable efficient keyphrase generation at scale, we distill the knowledge from LLMs to a base-size language model. Our evaluation shows significant increase in user reach when the generated keyphrases are used for contextual targeting at Yahoo.
In digital marketing, precise audience targeting is crucial for campaign efficiency. However, digital marketing agencies often struggle with incomplete user profiles and interaction details from Advertising Identifier (ADID) data in user behavior modeling. To address this, we introduce the Deep Journey Hierarchical Attention Networks (DJHAN). This novel method enhances conversion predictions by leveraging heterogeneous action sequences associated with ADIDs and encapsulating these interactions into structured journeys. These journeys are hierarchically aggregated to effectively represent ADID's behavioral attributes. Moreover, DJHAN incorporates three specialized attention mechanisms: temporal attention for time-sensitive contexts, action attention for emphasizing key behaviors, and journety attention for highlighting influential journeys in the purchase conversion process. Emprically, DJHAN surpasses state-of-the-art (SOTA) models across three diverse datasets, including real-world data from NasMedia, a leading media representative in Asia. In backtesting simulations with three advertisers, DJHAN outperforms existing baselines, achieving the highest improvements in Conversion Rate (CVR) and Return on Ad Spend (ROAS) across three advertisers, demonstrating its practical potential in digital marketing.
This paper introduces LiNR, LinkedIn's large-scale, GPU-based retrieval system. LiNR supports a billion-sized index on GPU models. We discuss our experiences and challenges in creating scalable, differentiable search indexes using TensorFlow and PyTorch at production scale. In LiNR, both items and model weights are integrated into the model binary. Viewing index construction as a form of model training, we describe scaling our system for large indexes, incorporating full scans and efficient filtering. A key focus is on enabling attribute-based pre-filtering for exhaustive GPU searches, addressing the common challenge of post-filtering in KNN searches that often reduces system quality. We further provide multi-embedding retrieval algorithms and strategies for tackling cold start issues in retrieval. Our advancements in supporting larger indexes through quantization are also discussed. We believe LiNR represents one of the industry's first Live-updated model-based retrieval indexes. Applied to out-of-network post recommendations on LinkedIn Feed, LiNR has contributed to a 3% relative increase in professional daily active users. We envisage LiNR as a step towards integrating retrieval and ranking into a single GPU model, simplifying complex infrastructures and enabling end-to-end optimization of the entire differentiable infrastructure through gradient descent.
With large neural models becoming increasingly accurate and powerful, they have raised privacy and transparency concerns on data usage. Therefore, data platforms, regulations and user expectations are rapidly evolving leading to enforcing privacy via aggregation. We focus on the use case of online advertising where the emergence of aggregate data is imminent and can significantly impact the multi-billion dollar industry. In aggregated datasets, labels are assigned to groups of data points rather than individual data points. This leads to a formulation of a weakly supervised task - Learning from Label Proportions where a model is trained on groups (a.k.a bags) of instances and their corresponding label proportions to predict labels for individual instances. While learning on aggregate data due to privacy concerns is becoming increasingly popular there is no large scale benchmark for measuring performance and guiding improvements on this important task. We propose LLP-Bench - a web scale benchmark with ~ 70 datasets and 45 million datapoints. To the best of our knowledge, LLP-Bench is the first large scale tabular LLP benchmark with an extensive diversity in constituent datasets, realistic in terms of the sponsored search datasets used and aggregation mechanisms followed. Through more than 3000 experiments we compare the performance of 9 SOTA methods in detail. To the best of our knowledge, this is the first study that compares diverse approaches in such depth.
Video summarization techniques have been proven to improve the overall user experience when it comes to accessing and comprehending video content. If the user's preference is known, video summarization can identify significant information or relevant content from an input video, aiding them in obtaining the necessary information or determining their interest in watching the original video. Adapting video summarization to various types of video and user preferences requires significant training data and expensive human labeling. To facilitate such research, we proposed a new benchmark for video summarization that captures various user preferences. Also, we present a pipeline called Video Summarization with Language (VSL) for user-preferred video summarization that is based on pre-trained visual language models (VLMs) to avoid the need to train a video summarization system on a large training dataset. The pipeline takes both video and closed captioning as input and performs semantic analysis at the scene level by converting video frames into text. Subsequently, the user's genre preference was used as the basis for selecting the pertinent textual scenes. The experimental results demonstrate that our proposed pipeline outperforms current state-of-the-art unsupervised video summarization models. We show that our method is more adaptable across different datasets compared to supervised query-based video summarization models. In the end, the runtime analysis demonstrates that our pipeline is more suitable for practical use when scaling up the number of user preferences and videos.
Rich user behavior data has been proven to be of great value for recommendation systems. Modeling lifelong user behavior data in the retrieval stage to explore user long-term preference and obtain comprehensive retrieval results is crucial. Existing lifelong modeling methods cannot applied to the retrieval stage because they extract target-relevant items through the coupling between the user and the target item. Moreover, the current retrieval methods fail to precisely capture user interests when the length of the user behavior sequence increases further. That leads to a gap in the ability of retrieval models to model lifelong user behavior data. In this paper, we propose the concept of missing interest, leveraging the idea of complementarity, which serves as a supplement to short-term interest based on lifelong behavior data in the retrieval stage. Specifically, we design a missing interest operator and deploy it in Kafka data stream, without incurring latency or storage costs. This operator derives categories and authors of items that the user was previously interested in but has recently missed, and uses these as triggers to output missing features to the downstream retrieval model. Our retrieval model is a complete dual-tower structure that combines short-term and missing interests on the user side to provide a comprehensive depiction of lifelong behaviors. Since 2023, the presented solution has been deployed in Kuaishou, one of the most popular short-video streaming platforms in China with hundreds of millions of active users.
Collaborative filtering on user-item interaction graphs has achieved success in the industrial recommendation. However, recommending users' truly fascinated items poses a seesaw dilemma for collaborative filtering models learned from the interaction graph. On the one hand, not all items that users interact with are equally appealing. Some items are genuinely fascinating to users, while others are unfascinated. Training graph collaborative filtering models in the absence of distinction between them can lead to the recommendation of unfascinating items to users. On the other hand, disregarding the interacted but unfascinating items during graph collaborative filtering will result in an incomplete representation of users' interaction intent, leading to a decline in the model's recommendation capabilities. To address this seesaw problem, we propose Feedback Reciprocal Graph Collaborative Filtering (FRGCF), which emphasizes the recommendation of fascinating items while attenuating the recommendation of unfascinating items. Specifically, FRGCF first partitions the entire interaction graph into the Interacted & Fascinated (I&F) graph and the Interacted & Unfascinated (I&U) graph based on the user feedback. Then, FRGCF introduces separate collaborative filtering on the I&F graph and the I&U graph with feedback-reciprocal contrastive learning and macro-level feedback modeling. This enables the I&F graph recommender to learn multi-grained interaction characteristics from the I&U graph without being misdirected by it. Extensive experiments on four benchmark datasets and a billion-scale industrial dataset demonstrate that FRGCF improves the performance by recommending more fascinating items and fewer unfascinating items. Besides, online A/B tests on Taobao's recommender system verify the superiority of FRGCF.
Dynamically planning in complex systems has been explored to improve decision-making in various domains. Professional basketball serves as a compelling example of a dynamic spatio-temporal game, encompassing context-dependent decision-making. However, processing the diverse on-court signals and navigating the vast space of potential actions and outcomes make it difficult for existing approaches to swiftly identify optimal strategies in response to evolving circumstances. In this study, we formulate the sequential decision-making process as a conditional trajectory generation process. Based on the formulation, we introduce PlayBest (PLAYer BEhavior SynThesis), a method to improve player decision-making. We extend the diffusion probabilistic model to learn challenging environmental dynamics from historical National Basketball Association (NBA) player motion tracking data. To incorporate data-driven strategies, an auxiliary value function is trained with corresponding rewards. To accomplish reward-guided trajectory generation, we condition the diffusion model on the value function via classifier-guided sampling. We validate the effectiveness of PlayBest through simulation studies, contrasting the generated trajectories with those employed by professional basketball teams. Our results reveal that the model excels at generating reasonable basketball trajectories that produce efficient plays. Moreover, the synthesized play strategies exhibit an alignment with professional tactics, highlighting the model's capacity to capture the intricate dynamics of basketball games.
Utilizing market forecasts is pivotal in optimizing portfolio selection strategies. We introduce DeepClair, a novel framework for portfolio selection. DeepClair leverages a transformer-based time-series forecasting model to predict market trends, facilitating more informed and adaptable portfolio decisions. To integrate the forecasting model into a deep reinforcement learning-driven portfolio selection framework, we introduced a two-step strategy: first, pre-training the time-series model on market data, followed by fine-tuning the portfolio selection architecture using this model. Additionally, we investigated the optimization technique, Low-Rank Adaptation (LoRA), to enhance the pre-trained forecasting model for fine-tuning in investment scenarios. This work bridges market forecasting and portfolio selection, facilitating the advancement of investment strategies.
We present Blind-Match, a novel biometric identification system that leverages homomorphic encryption (HE) for efficient and privacy-preserving 1:N matching. Blind-Match introduces a HE-optimized cosine similarity computation method, where the key idea is to divide the feature vector into smaller parts for processing rather than computing the entire vector at once. By optimizing the number of these parts, Blind-Match minimizes execution time while ensuring data privacy through HE. Blind-Match achieves superior performance compared to state-of-the-art methods across various biometric datasets. On the LFW face dataset, Blind-Match attains a 99.63% Rank-1 accuracy with a 128-dimensional feature vector, demonstrating its robustness in face recognition tasks. For fingerprint identification, Blind-Match achieves a remarkable 99.55% Rank-1 accuracy on the PolyU dataset, even with a compact 16-dimensional feature vector, significantly outperforming the state-of-the-art method, Blind-Touch, which achieves only 59.17%. Furthermore, Blind-Match showcases practical efficiency in large-scale biometric identification scenarios, such as Naver Cloud's FaceSign, by processing 6,144 biometric samples in 0.74 seconds using a 128-dimensional feature vector.
Although the widespread use of AI systems in today's world is growing, many current AI systems are found vulnerable due to hidden bias and missing information, especially in the most commonly used forecasting system. In this work, we explore the robustness and explainability of AI-based forecasting systems. We provide an in-depth analysis of the underlying causality involved in the effect prediction task and further establish a causal graph based on treatment, adjustment variable, confounder, and outcome. Correspondingly, we design a causal interventional prediction system (CIPS) based on a variational autoencoder and fully conditional specification of multiple imputations. Extensive results demonstrate the superiority of our system over state-of-the-art methods and show remarkable versatility and extensibility in practice.
In the domain of e-commerce, query rewriting is a potent strategy for bridging the lexical gap between search queries and product descriptions, thereby enhancing the recall rate of search engines. This research introduces a query rewriting framework predicated on large language models (LLM), encompassing three phases of training: domain-specific pre-training, supervised fine-tuning (SFT) and reinforcement learning (RL) for objective alignment. To detail, the process initiates with domain-specific pre-training using consumer behavior data and product descriptions from JD.com. Subsequently, we filter and utilize high-quality query-rewrite pairs for SFT. The final stage employs RL to refine the model's objective alignment, utilizing an offline search system as the simulation environment. The RL's training reward is derived from the recall rate, aiming to optimize the number of relevant products the rewrites retrieve. Through offline evaluations, our method has demonstrated its capacity to substantially enhance the efficacy of LLMs for e-commerce query rewriting. Moreover, online A/B testing has corroborated that our approach significantly boosts the number of purchases made per user (UCVR). Since December 2023, our approach has been successfully implemented on JD.com, one of China's most frequented online shopping platforms.
Learning high-quality item embeddings is crucial for recommendation tasks such as matching and ranking. However, existing methods often rely on ID-based item embeddings learned end-to-end with downstream recommendation models, which may suffer from overfitting and limited generalizability. In this paper, we aim to learn universal item embeddings (dubbed UniEmbedding) that capture multi-modal semantics, generalize across multiple domains, and serve different downstream tasks. To achieve this goal, we introduce the UniEmbedding pretraining framework, which includes three modules: a domain-aware multi-modal adapter, a user-view projection module, and contrastive learning objectives across domains. Compared to naive ID embeddings, UniEmbedding provides rich semantic information that generalizes more effectively across domains. Unlike multi-modal embeddings directly extracted from off-the-shelf pretrained models, UniEmbedding achieves better alignment between content semantics and behaviors. We evaluated UniEmbedding on both public and industrial datasets, demonstrating its effectiveness in matching and ranking tasks. Furthermore, UniEmbedding has been deployed in multiple recommendation applications at Huawei, resulting in significant gains in user engagement metrics.
The Airbnb search system grapples with many unique challenges as it continues to evolve. We oversee a marketplace that is nuanced by geography, diversity of homes, and guests with a variety of preferences. Crafting an efficient search system that can accommodate diverse guest needs, while showcasing relevant homes lies at the heart of Airbnb's success. Airbnb search has many challenges that parallel other recommendation and search systems but it has a unique information retrieval problem, upstream of ranking, called location retrieval. It requires defining a topological map area that is relevant to the searched query for homes listing retrieval. The purpose of this paper is to demonstrate the methodology, challenges, and impact of building a machine learning based location retrieval product from the ground up. Despite the lack of suitable, prevalent machine learning based approaches, we tackle cold start, generalization, differentiation and algorithmic bias. We detail the efficacy of heuristics, statistics, machine learning, and reinforcement learning approaches to solve these challenges, particularly for systems that are often unexplored by current literature.
Recent innovations have made it possible to produce millions of distinct nanoparticles on a chip. These vast volumes of data are impossible to analyze manually, necessitating the development of automated tools. In previous work, we created a binary classification machine learning model to select quality nanoparticle images for downstream analysis. In this work, we show that adding a custom image preprocessing step before model training can produce significantly higher-performing models in a fraction of the time and make the model more robust to different image noise levels and microscope acquisition settings. The proposed image processing pipeline effectively cleans raw nanoparticle images, enhances key features, and allows us to use much lower resolution images and simpler neural network model architectures, resulting in higher performance and significant cost savings. Experiments demonstrate superior performance relative to our baseline, including a 15% improvement in recall and more than a 10% increase in accuracy. Given the high cost of downstream analysis, it is critical to minimize false positives in our application, and our best-performing model obtains a precision of 97.3% and weighted F-score of 95.9% on an unseen test set. Additionally, model training time is reduced from 15.5 hours to 32 seconds. We expect that adopting this pipeline for AI-driven automated nanoparticle characterization will offer a considerable speedup in the laboratory, allowing researchers to rapidly and accurately analyze much greater volumes of data and accelerate materials discovery.
Photovoltaic (PV) power stations have become an integral component to the global sustainable energy landscape. Accurately monitoring and estimating the performance of PV systems is critical to their feasibility for power generation and as a financial asset. One of the most challenging problems is to understand and estimate the long-term Performance Loss Rate (PLR) for large fleets of PV inverters. This paper introduces a novel Spatio-Temporal Graph Neural Network empowered, long-term Trend analysis system (ST-GTrend), to estimate PLR of PV systems at fleet-level. ST-GTrend nontrivially integrates spatio-temporal coherence and graph attention to separate PLR as a long-term 'aging' trend from multiple fluctuation terms in the PV input data, with a design that can easily scale to large PV sets with effective, multi-level parallel computation. (1) To cope with diverse degradation patterns in timeseries, ST-GTrend adopts a paralleled graph autoencoder array to extract aging and fluctuation terms simultaneously, and imposes flatness and smoothness regularizations to disentangle between aging and fluctuation. (2) For large PV systems, ST-GTrend enables a multi-level parallelization paradigm to scale the training and inference computation with a provable performance guarantee. ST-GTrend has been deployed in CRADLE, a scientific high performance computing infrastructure. We evaluated ST-GTrend with three real-world large-scale PV datasets, spanning a time period of 10 years. Our results show that ST-GTrend reduces MAPE and Euclidean distance-based errors on average by 34.74% and 33.66% of SOTA methods, and scales well to large PV sets. We also showcase that the advantages of ST-GTrend generalize for the need of long-term trend analysis in financial and economic data.
In the dynamic landscape of large enterprise cybersecurity, accurately and efficiently correlating billions of security alerts into comprehensive incidents is a substantial challenge. Traditional correlation techniques often struggle with maintenance, scaling, and adapting to emerging threats and novel sources of telemetry. We introduce GraphWeaver, an industry-scale framework that shifts the traditional incident correlation process to a data-optimized, geo-distributed graph based approach. GraphWeaver introduces a suite of innovations tailored to handle the complexities of correlating billions of shared evidence alerts across hundreds of thousands of enterprises. Key among these innovations are a geo-distributed database and PySpark analytics engine for large-scale data processing, a minimum spanning tree algorithm to optimize correlation storage, integration of security domain knowledge and threat intelligence, and a human-in-the-loop feedback system to continuously refine key correlation processes and parameters. GraphWeaver is integrated into the Microsoft Defender XDR product and deployed worldwide, handling billions of correlations with a 99% accuracy rate, as confirmed by customer feedback and extensive investigations by security experts. This integration has not only maintained high correlation accuracy but reduces traditional correlation storage requirements by 7.4x. We provide an in-depth overview of the key design and operational features of GraphWeaver, setting a precedent as the first cybersecurity company to openly discuss these critical capabilities at this level of depth.
Listeners of long-form talk-audio content, such as podcast episodes, often find it challenging to understand the overall structure and locate relevant sections. A practical solution is to divide episodes into chapters'semantically coherent segments labeled with titles and timestamps. Since most episodes on our platform at Spotify currently lack creator-provided chapters, automating the creation of chapters is essential. Scaling the chapterization of podcast episodes presents unique challenges. First, episodes tend to be less structured than written texts, featuring spontaneous discussions with nuanced transitions. Second, the transcripts are usually lengthy, averaging about 16,000 tokens, which necessitates efficient processing that can preserve context. To address these challenges, we introduce PODTILE, a fine-tuned encoder-decoder transformer to segment conversational data. The model simultaneously generates chapter transitions and titles for the input transcript. To preserve context, each input text is augmented with global context, including the episode's title, description, and previous chapter titles. In our intrinsic evaluation, PODTILE achieved a 11% improvement in ROUGE score over the strongest previous baseline. Additionally, we provide insights into the practical benefits of auto-generated chapters for listeners navigating episode content. Our findings indicate that auto-generated chapters serve as a useful tool for engaging with less popular podcasts. Finally, we present empirical evidence that using chapter titles can enhance the effectiveness of sparse retrieval in search tasks.
Here, we describe one of the first Web-scale hybrid Knowledge Graph (KG)-Large Language Model (LLM), populated with the latest peer-reviewed medical knowledge on colorectal Cancer. It is currently being evaluated to assist with both medical research and clinical information retrieval tasks at Moffitt Cancer Center and Research Institute, which is one of the top Cancer centers in the U.S. and in the world. Our hybrid is remarkable as it serves the user needs better than just an LLM, KG or a search-engine in isolation. LLMs as is are known to exhibit hallucinations and catastrophic forgetting as well as are trained on outdated corpora. The state of the art KGs, such as PrimeKG, cBioPortal, ChEMBL, NCBI, and other require manual curation, hence are quickly getting stale. CancerKG is unsupervised and is capable of automatically ingesting and organizing the latest medical findings. To alleviate the LLMs shortcomings, the verified KG serves as a Retrieval Augmented Generation (RAG) guardrail. CancerKG exhibits 5 different advanced user interfaces, each tailored to serve different data modalities better and more convenient for the user. We evaluated CancerKG on real user queries and report a high NDCG score on a large-scale corpora of approximately 44K publications.
In real-time bidding systems, ad exchanges and supply-side platforms (SSP) are switching from the second-price auction (SPA) to the first-price auction (FPA), where the advertisers should pay what they bid if they win the auction. To avoid overpaying, advertisers are motivated to conceal their truthful evaluations of impression opportunities through bid shading methods. However, advertisers are consistently facing a trade-off between the probability and cost-saving of winning, due to the information asymmetry, where advertisers lack knowledge about their competitors' bids in the market. To address this challenge, we propose a Bayes ian Multi-Armed Bandit (BayesMAB) algorithm for bid shading when the winning price is unknown to advertisers who lose the impression opportunity. BayesMAB incorporates the mechanism of FPA to infer each price interval's winning rate by progressively updating the market price hidden by SSP. In this way, BayesMAB better approximates the winning rates of price intervals and thus is able to derive the optimal shaded bid that balances the trade-off between the probability and cost-saving of winning the impression opportunity. We conducted large-scale A/B tests on Tencent's online display advertising platform. The cost-per-mile (CPM) and cost-per-action (CPA) decreased by 13.06% and 11.90%, respectively, whereas the return on investment (ROI) increased by 12.31% with only 2.7% sacrifice of the winning rate. We also validated BayesMAB's superior performance in an offline semi-simulated experiment with SPA data sets. BayesMAB has been deployed online and is impacting billions of traffic every day. Codes are available at https://github.com/BayesMAB/BayesMAB.
Graph Neural Networks (GNNs) have emerged as powerful tools for supervised machine learning over graph-structured data, while sampling-based node representation learning is widely utilized in unsupervised learning. However, scalability remains a major challenge in both supervised and unsupervised learning for large graphs (e.g., those with over 1 billion nodes). The scalability bottleneck largely stems from the mini-batch sampling phase in GNNs and the random walk sampling phase in unsupervised methods. These processes often require storing features or embeddings in memory. In the context of distributed training, they require frequent, inefficient random access to data stored across different workers. Such repeated inter-worker communication for each mini-batch leads to high communication overhead and computational inefficiency.
We propose GraphScale, a unified framework for both supervised and unsupervised learning to store and process large graph data distributedly. The key insight in our design is the separation of workers who store data and those who perform the training. This separation allows us to decouple computing and storage in graph training, thus effectively building a pipeline where data fetching and data computation can overlap asynchronously. Our experiments show that GraphScale outperforms state-of-the-art methods for distributed training of both GNNs and node embeddings. We evaluate GraphScale both on public and proprietary graph datasets and observe a reduction of at least 40% in end-to-end training times compared to popular distributed frameworks, without any loss in performance. While most existing methods don't support billion-node graphs for training node embeddings, GraphScale is currently deployed in production at TikTok enabling efficient learning over such large graphs.
Modern manufacturing relies heavily on fusion welding processes, including gas metal arc welding (GMAW), which efficiently converts electrical energy into thermal energy to join metals. Despite decades of research and extensive application in the automotive and aerospace sectors, weld quality assessment in the GMAW process remains a major challenge. This paper presents a novel learning-based approach relying on a vector quantised variational autoencoder (VQ-VAE) for data representation. In addition, we are the first to provide a time series dataset to the research community that combines labeled and unlabeled time series data from the GMAW domain, thereby enabling further research. The core idea of our approach consists of two stages: In the first stage, we use a learned automatic extraction of local features of the input signal using a VQ-VAE architecture. Based on this, in the second stage, we use a transformer model that processes the discretized features and performs weld quality prediction and classification. Our approach addresses real-world scenarios and improves the prediction of quality and fill existing data gaps by providing a reliable approach for quality assessment during manufacturing based on sensor data.
In recent years, there has been a growing interest in probabilistic forecasting methods that offer more comprehensive insights by considering prediction uncertainties rather than point estimates. This paper introduces a novel variational autoencoder learning framework for multivariate distributional forecasting. Our approach employs distributional learning to directly estimate the cumulative distribution function of future time series conditional distributions using the continuous ranked probability score. By incorporating a temporal structure within the latent space and utilizing versatile quantile models, such as the generalized lambda distribution, we enable distributional forecasting by generating synthetic time series data for future time points. To assess the effectiveness of our method, we conduct experiments using a multivariate dataset of real cryptocurrency prices, demonstrating its superiority in forecasting high-volatility scenarios.
Polymer property performance prediction aims to forecast specific features or attributes of polymers, which has become an efficient ap- proach to measuring their performance. However, existing machine learning models face challenges in effectively learning polymer representations due to low-quality polymer datasets, which conse- quently impact their overall performance. This study focuses on improving polymer property performance prediction tasks by re- constructing an optimal and explainable descriptor representation space. Nevertheless, prior research such as feature engineering and representation learning can only partially solve this task since they are either labor-incentive or unexplainable. This raises two issues: 1) automatic transformation and 2) explainable enhancement. To tackle these issues, we propose our unique Traceable Group-wise Reinforcement Generation Perspective. Specifically, we redefine the reconstruction of the representation space into an interactive pro- cess, combining nested generation and selection. Generation creates meaningful descriptors, and selection eliminates redundancies to control descriptor sizes. Our approach employs cascading reinforce- ment learning with three Markov Decision Processes, automating descriptor and operation selection, and descriptor crossing. We utilize a group-wise generation strategy to explore and enhance reward signals for cascading agents. Ultimately, we conduct experi- ments to indicate the effectiveness of our proposed framework.
This paper introduces DAMOCRO, a <u>da</u>ta <u>m</u>igration framework using <u>o</u>nline <u>c</u>lassification and tuple <u>r</u>e<u>o</u>rdering to improve throughput and decrease the costs of data migration. The DAMOCRO workflow consists of four main steps. First, it classifies records into subgroups to maximize the similarity within each group. Next, it reorders tuples within these groups, ensuring that similar tuples are adjacent. Subsequently, column-wise compression is applied to each group. Finally, the compressed data is transferred from the source to the target machine. The initial two steps enhance the compression ratio, thereby boosting throughput and reducing costs. Our evaluations on five real-world datasets and two benchmark datasets, show that the online classification process in DAMOCRO improves throughput by more than 24% and reduces costs by over 19% compared to baselines. Besides, implementing reordering based on functional dependencies brings an additional cost reduction ranging from 10% to 60%, while also enhancing throughput.
The legal landscape encompasses a wide array of lawsuit types, presenting lawyers with challenges in delivering timely and accurate information to clients, particularly concerning critical aspects like potential imprisonment duration or financial repercussions. Compounded by the scarcity of legal experts, there's an urgent need to enhance the efficiency of traditional legal workflows. Recent advances in deep learning, especially Large Language Models (LLMs), offer promising solutions to this challenge. Leveraging LLMs' mathematical reasoning capabilities, we propose a novel approach integrating LLM-based methodologies with specially designed prompts to address precision requirements in legal Artificial Intelligence (LegalAI) applications. The proposed work seeks to bridge the gap between traditional legal practices and modern technological advancements, paving the way for a more accessible, efficient, and equitable legal system. To validate this method, we introduce a curated dataset tailored to precision-oriented LegalAI tasks, serving as a benchmark for evaluating LLM-based approaches. Extensive experimentation confirms the efficacy of our methodology in generating accurate numerical estimates within the legal domain, emphasizing the role of LLMs in streamlining legal processes and meeting the evolving demands of LegalAI. Github: https://github.com/Jhhuangkay/Optimizing-Numerical-Estimation-and-Operational-Efficiency-in-the-Legal-Domain.
Cross-domain recommendation has attracted substantial interest in industrial apps such as Meituan, which serves multiple business domains via knowledge transfer and meets the diverse interests of users. However, existing methods typically follow an implicit modeling paradigm that blends the knowledge from both the source and target domains, and design intricate network structures to share learned embeddings or patterns between domains to improve recommendation accuracy. Since the transfer of interest signals is unsupervised, these implicit paradigms often struggle with the negative transfer resulting from differences in service functions and presentation forms across different domains. In this paper, we propose a simple and effective EXplicit Interest Transfer framework named EXIT to address the stated challenge. Specifically, we propose a novel label combination approach that enables the model to directly learn beneficial source domain interests through supervised learning, while excluding inappropriate interest signals. Moreover, we introduce a scene selector network to model the interest transfer intensity under fine-grained scenes. Offline experiments conducted on the industrial production dataset and online A/B tests validate the superiority and effectiveness of our proposed framework. Without complex network structures or training processes, EXIT can be easily deployed in the industrial recommendation system. EXIT has been successfully deployed in the online homepage recommendation system of Meituan App, serving the main traffic.
As online transactions rapidly increase, money laundering has become more difficult to detect, rendering traditional rule-based algorithms inadequate for the current severe laundering landscape. Although efforts have been made to model user behavior sequences for detecting money laundering, these approaches still fall short in scenarios with extremely low anomaly rates. In our anti-money laundering practices, we have identified the following three challenges: weak perception of intensity, scarce labels, poor representation robustness. In this paper, we present CLeAR, a novel robust sequence-based self-supervised Representation Learning framework for Anti-Money Laundering. To address the weak perception of intensity, we devise an Intensity-Aware Transformer to better capture the nuances of user behavior sequences. By introducing sequence-based Contrastive Learning into this task, we effectively tackle the issue of scarce labels and enhance sequence modeling. Additionally, we developed two self-supervised learning tasks-next behavior matching and sub-sequence matching-that significantly enhance the overall robustness of representation. After rigorous experiments across datasets of various scales, CLeAR consistently delivers exceptional performance, even under the extremely low anomaly rates that closely mimic real-world conditions.
Graph-based Recommendation Systems (GRSs) have gained prominence for their ability to enhance the accuracy and effectiveness of recommender systems by exploiting structural relationships in user-item interaction data. Despite their advanced capabilities, we find GRSs are susceptible to feedback-loop phenomena that disproportionately diminish the visibility of new and long-tail items, leading to a homogenization of recommendations and the potential emergence of echo chambers. To mitigate this feedback-loop issue, exploration and exploitation (E&E) strategies have been extensively researched. However, conventional E&E methods rest on the assumption that recommendations are independent and identically distributed-an assumption that is not valid for GRSs. To forge an effective E&E approach tailored to GRSs, we introduce a novel framework, the <u>GRAD</u>ient-informed <u>E</u>xploration and Exploitation (GRADE), designed to adaptively seek out underrepresented or new items with promising rewards. Our method evaluates the potential benefit of exploring an item by assessing the change in the system's empirical risk error pre- and post-exposure. For practical implementation, we approximate this measure using the gradients of potential edges and model parameters, alongside their associated uncertainties. We then orchestrate the balance between exploration and exploitation utilizing Thompson sampling and the Upper Confidence Bound (UCB) strategy. Empirical tests on datasets from two industrial environments demonstrate that GRADE consistently outperforms existing state-of-the-art methods. Additionally, our approach has been successfully integrated into actual industrial systems.
In recommendation systems, the relevance and novelty of the final results are selected through a cascade system of Matching -> Ranking -> Strategy. The matching model serves as the starting point of the pipeline and determines the upper bound of the subsequent stages. Balancing the relevance and novelty of matching results is a crucial step in the design and optimization of recommendation systems, contributing significantly to improving recommendation quality. However, the typical matching algorithms have not simultaneously addressed the relevance and novelty perfectly. One main reason is that deep matching algorithms exhibit significant uncertainty when estimating items in the long tail (e.g., due to insufficient training samples) items.
The uncertainty not only affects the training of the models but also influences the confidence in the index construction and beam search retrieval process of these models.
This paper proposes the UICR (Uncertainty-based explore for Index Construction and Retrieval) algorithm, which introduces the concept of uncertainty modeling in the matching stage and achieves multi-task modeling of model uncertainty and index uncertainty. The final matching results are obtained by combining the relevance score and uncertainty score infered by the model. Experimental results demonstrate that the UICR improves novelty without sacrificing relevance on real-world industrial productive environments and multiple open-source datasets. Remarkably, online A/B test results of display advertising in Shopee demonstrates the effectiveness of the proposed algorithm.
Graph Neural Networks (GNNs) have become critical in various domains such as online advertising but face scalability challenges due to the growing size of graph data, leading to the needs for advanced distributed GPU computation strategies across multiple nodes. This paper presents PGLBox-Cluster, a robust distributed graph learning framework constructed atop the PaddlePaddle platform, implemented to efficiently process graphs comprising billions of nodes and edges. Through strategic partitioning of the model, node attributes, and graph data and leveraging industrial-grade RPC and NCCL for communication, PGLBox-Cluster facilitates effective distributed computation. The extensive experimental results confirm that PGLBox-Cluster achieves a 1.94x to 2.93x speedup over the single-node configuration, significantly elevating graph neural network scalability and efficiency by handling datasets exceeding 3 billion nodes and 120 billion edges with its novel asynchronous communication and graph partitioning techniques. The repository is released at This Link.
Recommender systems with cascading architecture play an increasingly significant role in online recommendation platforms, where the approach to dealing with negative feedback is a vital issue. For instance, in short video ad platforms, users tend to quickly slip away from ad candidates that they feel aversive, and recommender systems are expected to receive these explicit negative feedback and make adjustments to avoid these recommendations.Considering recency effect in memories, we propose a forgetting model based on Ebbinghaus Forgetting Curve to cope with negative feedback. In addition, we introduce a Pareto optimization solver to guarantee a better trade-off between recency and model performance.In conclusion, we propose Pareto-based Multi-Objective Recommender System with forgetting curve (PMORS), which can be applied to any multi-objective recommendation and show sufficiently superiority when facing explicit negative feedback.We have conducted evaluations of PMORS and achieved favorable outcomes in short-video scenarios on both public dataset and industrial dataset. After being deployed on an online short video ad platform named WeChat Channels Ads in May, 2023, PMORS has not only demonstrated promising results for both consistency and recency but also achieved an improvement of up to +1.45% Gross Merchandise Volume (GMV).
In recent years, Contrastive Learning (CL) has become a predominant representation learning paradigm for time series. Most existing methods manually build specific CL Strategies (CLS) by human heuristics for certain datasets and tasks. However, manually developing CLS usually requires excessive prior knowledge about the data, and massive experiments to determine the detailed CL configurations. In this paper, we present an Automated Machine Learning (AutoML) practice at Microsoft, which automatically learns CLS for time series datasets and tasks, namely Automated Contrastive Learning (AutoCL). We first construct a principled search space of size over 3 × 1012, covering data augmentation, embedding transformation, contrastive pair construction, and contrastive losses. Further, we introduce an efficient reinforcement learning algorithm, which optimizes CLS from the performance on the validation tasks, to obtain effective CLS within the space. Experimental results on various real-world datasets demonstrate that AutoCL could automatically find the suitable CLS for the given dataset and task. From the candidate CLS found by AutoCL on several public datasets/tasks, we compose a transferable Generally Good Strategy (GGS), which has a strong performance for other datasets. We also provide empirical analysis as a guide for the future design of CLS.
Complex dialog systems often use retrieved evidence to facilitate factual responses. Such RAG (Retrieval Augmented Generation) systems retrieve from heterogeneous data stores that are architected as multiple indexes or APIs instead of a single monolithic source. For a given query, relevant evidence needs to be retrieved from one (or few) retrieval source. Complex queries can even require multi-step retrieval. For example, a conversational agent on a retail site answering customer questions about past orders need to retrieve the appropriate customer order first and then the evidence relevant to the customer's question in the context of the ordered product. Most RAG Agents handle such Chain-of-Thought (CoT) tasks by interleaving reasoning and retrieval steps. However, each reasoning step directly adds to the latency of the system. For large models this latency cost is significant -- in the order of multiple seconds. Multi-agent systems may classify the query to a single Agent associated with a retrieval source, which means that a (small) classification model dictates the performance of a large language model. To address this problem, we present REAPER (REAsoning-based PlannER), an LLM-based retrieval planner that we evaluate on a conversational shopping assistant, which shows significant gains in latency over Agent-based systems and scalability to new and unseen use cases when compared to classification-based planning.
As e-commerce stores broaden their reach into new regions and introduce new products within established markets, the development of effective machine learning models becomes increasingly challenging due to the scarcity of labeled data. Traditional transfer learning methods typically require some labeled data from the target domain and often face computational bottlenecks. Despite the availability of a few transfer learning techniques, most are primarily developed for vision and text applications, making them unsuitable for other types of data. In many industries, however, tabular data is a predominant and crucial data type. Our work introduces XCapsUTL, a novel unsupervised transfer learning framework specifically designed for tabular data, aiming to fill this significant gap. Our approach leverages Capsule Neural Networks (CapsNet) to extract domain-agnostic knowledge. This knowledge is then refined using a constrained fine-tuning process, ensuring adaptability to the target task while preserving learned representations. XCapsUTL's unique feature encapsulation capabilities within CapsNet promote effective knowledge transfer without the need for designing effective feature-wise interaction approaches to capture higher-level semantics. Extensive experiments demonstrate the robustness and generalization capabilities of XCapsUTL across multiple domains and datasets, highlighting its practical significance and utility in addressing the unique challenges of tabular data in industry settings.
Crime situations are race against time. An AI-assisted criminal investigation system, providing prompt but precise legal counsel is in need for police officers. We introduce LAPIS (Language Model Augmented Police Investigation System), an automated system that assists police officers to perform rational and legal investigative actions. We constructed a finetuning dataset and retrieval knowledgebase specialized in crime investigation legal reasoning task. We extended the dataset's quality by incorporating manual curation efforts done by a group of domain experts. We then finetuned the pretrained weights of a smaller Korean language model to the newly constructed dataset and integrated it with the crime investigation knowledgebase retrieval approach. Experimental results show LAPIS' potential in providing reliable legal guidance for police officers, even better than the proprietary GPT-4 model. Qualitative analysis on the rationales generated by LAPIS demonstrate the model's reasoning ability to leverage the premises and derive legally correct conclusions.
In large-scale online experimentation platforms, experimenters aim to discover the best treatment (arm) among multiple candidates. Traditional A/B testing and multi-armed bandits (MAB) algorithms are two popular designs. The former usually achieves a higher power but may hurt the customers' satisfaction when always recommending a poor arm, while the latter aims at improving the customers' experience (collecting more rewards) but faces the loss of testing power. Recently, [26] combine the advantage of A/B testing and MAB algorithms to maximize the testing power while maintaining more rewards for experiments with two-arm and Bernoulli rewards. However, in practice, the number of arms is usually larger than two and the reward type also varies. In multi-arm experiments, the required sample size to find the optimal arm blows up to guarantee a false discovery rate with the increase of arm numbers, bringing high opportunity costs to experimenters. To save the cost during the long experimental process, we propose a more efficient sequential test framework named Soptima that can work with general reward types. Inspired by the design of traditional MAB algorithms in chasing rewards and A/B testing in maximizing power, we propose an Elimination-type strategy adapted to this framework to dynamically adjust the traffic split on arms. This strategy cooperating with Soptima simultaneously maintains the advantage of the A/B testing in maximizing the testing power, the sequential test methods in saving the sample size, and the MAB algorithms in collecting rewards. The theoretical analysis gives guarantees on the Type-I, Type-II, and optimality error rates of the proposed approach. A series of experiments from both simulation and industrial historical data sets are conducted to verify the superiority of our approach compared with available baselines.
SQL injection (SQLi) compromises database-driven applications by enabling attackers to insert malicious SQL commands via input fields, potentially leading to unauthorized access, data manipulation, or system compromise. In recent years, alongside the development of various rule-based Web Application Firewalls (WAFs) aimed at mitigating SQL injection attacks, there has also been a notable rise in the utilization of machine learning and deep learning techniques to address this issue. Although significant progress has been made in these studies, detecting and mitigating SQLi-related attacks continues to present a significant challenge. A crucial factor contributing to the lack of extensive SQLi detection solutions is the absence of a comprehensive testing methodology. In this work, we introduce XploitSQL-an innovative approach to advance adversarial SQL injection generation by leveraging language models and reinforcement learning. Our model is trained to produce evasive SQLi samples, enhancing the robustness of SQLi detection models and offering opportunities for more comprehensive detection strategies. To assess the efficacy of the proposed method, we employed state-of-the-art SQL injection detection models in conjunction with commercially available web-based firewalls. Across all tested detection models, detection rates declined when faced with evasive samples generated by XploitSQL. Furthermore, our model outperforms existing methods for generating attack samples.
Industrial-scale linear assignment problems (LAPs) are frequently encountered in various industrial scenarios, e.g., asset allocation within the domain of credit management. However, optimization algorithms for such problems (e.g., PJ-ADMM) are highly sensitive to hyper-parameters. Existing solving systems rely on empirical parameter selection, which is challenging to achieve convergence and extremely time-consuming. Additionally, the resulting parameter rules are often inefficient. To alleviate this issue, we propose RL-ISLAP, an efficient and lightweight Reinforcement Learning framework for Industrial-Scale Linear Assignment Problems. We formulate the hyper-parameter selection for PJ-ADMM as a sequential decision problem and leverage reinforcement learning to enhance its convergence. Addressing the sparse reward challenge inherent in learning policies for such problems, we devise auxiliary rewards to provide dense signals for policy optimization, and present a rollback mechanism to prevent divergence in the solving process. Experiments on OR-Library benchmark demonstrate that our method is competitive to SOTA stand-alone solvers. Furthermore, the scale-independent design of observations enables us to transfer the acquired hyper-parameter policy to a scenario of LAPs in varying scales. On two real-world industrial-scale LAPs with up to 10 millions of decision variables, our proposed RL-ISLAP achieves solutions of comparable quality in 2/3 of the time when compared to the SOTA distributed solving system employing fine-tuned empirical parameter rules.
In the field of Artificial Intelligence for Information Technology Operations, causal discovery is pivotal for operation and maintenance of systems, facilitating downstream industrial tasks such as root cause analysis. Temporal causal discovery, as an emerging method, aims to identify temporal causal relations between variables directly from observations by utilizing interventional data. However, existing methods mainly focus on synthetic datasets with heavy reliance on interventional targets and ignore the textual information hidden in real-world systems, failing to conduct causal discovery for real industrial scenarios. To tackle this problem, in this paper we investigate temporal causal discovery in industrial scenarios, which faces two critical challenges: how to discover causal relations without the interventional targets that are costly to obtain in practice, and how to discover causal relations via leveraging the textual information in systems which can be complex yet abundant in industrial contexts. To address these challenges, we propose the RealTCD framework, which is able to leverage domain knowledge to discover temporal causal relations without interventional targets. We first develop a score-based temporal causal discovery method capable of discovering causal relations without relying on interventional targets through strategic masking and regularization. Then, by employing Large Language Models (LLMs) to handle texts and integrate domain knowledge, we introduce LLM-guided meta-initialization to extract the meta-knowledge from textual information hidden in systems to boost the quality of discovery. We conduct extensive experiments on both simulation datasets and our real-world application scenario to show the superiority of our proposed RealTCD over existing baselines in temporal causal discovery.
A complementary item is an item that pairs well with another item when consumed together. In the context of e-commerce, providing recommendations for complementary items is essential for both customers and stores. Current models for suggesting complementary items often rely heavily on user behavior data, such as co-purchase relationships. However, just because two items are frequently bought together does not necessarily mean they are truly complementary. Relying solely on co-purchase data may not align perfectly with the goal of making meaningful complementary recommendations. In this paper, we introduce the concept of "coherent complement recommendation", where "coherent" implies that recommended item pairs are compatible and relevant. Our approach builds upon complementary item pairs, with a focus on ensuring that recommended items are well used together and contextually relevant. To enhance the explainability and coherence of our complement recommendations, we fine-tune the Large Language Model (LLM) with coherent complement recommendation and explanation generation tasks since LLM has strong natural language explanation generation ability and multi-task fine-tuning enhances task understanding. Experimental results indicate that our model can provide more coherent complementary recommendations than existing state-of-the-art methods, and human evaluation validates that our approach achieves up to a 48% increase in the coherent rate of complement recommendations.
Gross domestic product (GDP) nowcasting is crucial for policy-making as GDP growth is a key indicator of economic conditions. Dynamic factor models (DFMs) have been widely adopted by government agencies for GDP nowcasting due to their ability to handle irregular or missing macroeconomic indicators and their interpretability. However, DFMs face two main challenges: i) the lack of capturing economic uncertainties such as sudden recessions or booms, and ii) the limitation of capturing irregular dynamics from mixed-frequency data. To address these challenges, we introduce NCDENow, a novel GDP nowcasting framework that integrates neural controlled differential equations (NCDEs) with DFMs. This integration effectively handles the dynamics of irregular time series.NCDENow consists of 3 main modules: i) factor extraction leveraging DFM, ii) dynamic modeling using NCDE, and iii) GDP growth prediction through regression. We evaluate NCDENow against 6 baselines on 2 real-world GDP datasets from South Korea and the United Kingdom, demonstrating its enhanced predictive capability. Our empirical results favor our method, highlighting the significant potential of integrating NCDE into nowcasting models. Our code and dataset are available at https://github.com/sklim84/NCDENow_CIKM2024.
Embedding-based neural retrieval (EBR) is an effective search retrieval method in product search for tackling the vocabulary gap between customer search queries and products. The initial launch of our EBR system at Walmart yielded significant gains in relevance and add-to-cart rates [1]. However, despite EBR generally retrieving more relevant products for reranking, we have observed numerous instances of relevance degradation. Enhancing retrieval performance is crucial, as it directly influences product reranking and affects the customer shopping experience. Factors contributing to these degradations include false positives/negatives in the training data and the inability to handle query misspellings. To address these issues, we present several approaches to further strengthen the capabilities of our EBR model in terms of retrieval relevance. We introduce a Relevance Reward Model (RRM) based on human relevance feedback. We utilize RRM to remove noise from the training data and distill it into our EBR model through a multi-objective loss. In addition, we present the techniques to increase the performance of our EBR model, such as typo-aware training, and semi-positive generation. The effectiveness of our EBR is demonstrated through offline relevance evaluation, online AB tests, and successful deployments to live production.
With the development of the logistics industry, the user base of logistics services has expanded swiftly. This rapid increase in user scale presents significant challenges for logistics business management. A fundamental issue in such scenarios is audience expansion, which aims to find users willing to sign long-term services with logistics companies to foster business growth. Existing methods in addressing audience expansion mainly assume user modeling is entangled and neglects the inherent community structure among users. Due to these limitations, the effectiveness of traditional methods in achieving accurate user expansion is often restricted. Our work introduces a novel heterogeneous graph-based model, named Hi-DGN, which concentrates on the Hierarchical information propagation and aggregation in Disentangled Graph Networks for audience expansion. It consists of three main components: (i) the disentangled embedding layer to decouple user representations into different aspects, enabling the extraction of differentiated features; (ii) the hierarchical information propagation module partitions individual nodes into distinct groups and propagates information from group nodes to individual nodes hierarchically to capture diverse granularity representations; and (iii) the aggregation module to fuse all relation-specific embeddings to generate global node embeddings. Extensive experiments on two real-world datasets demonstrate the effectiveness of our method in various evaluation settings.
Recommendation systems play a crucial role in both industrial applications and research fields, which target to understand user preferences and intentions to provide personalized services. Compared to conventional recommendations, repurchase recommendations aim to suggest suitable products to users that they used to buy based on their intention evolution. Existing research on product recommendation can mainly be divided into behavior sequence-based methods and graph-based methods. Although these methods represent user interests and preference features effectively, they still fail to model repurchase behaviors because (i) the environment causing repurchase intention change is neglected and (ii) the lack of feedback after purchasing makes it difficult to learn the impacts of diverse behaviors. To comprehensively consider these limitations, we design a <u>D</u> ual <u>I</u> ntention-aware <u>F</u> usion <u>N</u> etwork framework (DIFN) to understand the effects of environment and after-purchasing feedback on users' intentions. Firstly, a hierarchical graph-based multi-level relational attention module is designed to effectively extract basic user features and spatial features from complex environmental information. Then, we introduce a behavior intention module and a usage intention module for different types of feedback data. Finally, we propose a dual intention fusion network that effectively fuses user basic features with spatial attributes and user intention features with temporal attributes for recommendation. Comprehensive evaluations on real-world datasets show that our method exceeds state-of-the-art baselines, which show an average of 8.2% improvements in different metrics.
Relevance modeling plays a crucial role in e-commerce search engines, striving to identify the utmost pertinent items corresponding to a given search query. With the rapid advancement of pre-trained large language models (LLMs), recent endeavors have leveraged the capabilities of LLMs in relevance modeling, resulting in enhanced performance. This is usually done through the process of fine-tuning LLMs on specifically annotated datasets to determine the relevance between queries and items. However, there are two limitations when LLMs are naively employed for relevance modeling through fine-tuning and inference. First, it is not inherently efficient for performing nuanced tasks beyond simple yes or no answers, such as assessing search relevance. It may therefore tend to be overconfident and struggle to distinguish fine-grained degrees of relevance (e.g., strong relevance, weak relevance, irrelevance) used in search engines. Second, it exhibits significant performance degradation when confronted with data distribution shift in real-world scenarios. In this paper, we propose a novel Distribution-Aware Robust Learning framework (DaRL) for relevance modeling in Alipay Search. Specifically, we design an effective loss function to enhance the discriminability of LLM-based relevance modeling across various fine-grained degrees of query-item relevance. To improve the generalizability of LLM-based relevance modeling, we first propose the Distribution-Aware Sample Augmentation (DASA) module. This module utilizes out-of-distribution (OOD) detection techniques to actively select appropriate samples that are not well covered by the original training set for model fine-tuning. Furthermore, we adopt a multi-stage fine-tuning strategy to simultaneously improve in-distribution (ID) and OOD performance, bridging the performance gap between them. DaRL has been deployed online to serve the Alipay's insurance product search. Both offline experiments on real-world industry data and online A/B testing show that DaRL effectively improves the performance of relevance modeling.
Achieving fairness among different individuals or groups is an essential task for industrial recommender systems. Due to the group's personalized selection tendencies and the non-uniform population distributions, existing industrial recommenders tend to make unfair predictions towards the preferences of minority groups. To alleviate this unfairness, we propose a model-agnostic self-adaptive fairness constraint framework (SaFair) based on the posterior preferences of different groups. We construct group-level and individual-level fairness constraints. The former measures consistency between group-level posterior preferences and predicted interests, and the latter relies on the degree of consistency in interests between a user and their associated group to perform self-adaptive constraints. In particular, to balance effectiveness and fairness, we utilize uncertainty estimation to adjust the intensity of constraints according to the model's learning status called self-adaptive constraints. Extensive offline experiments and online A/B Testing are conducted and the results validate the superiority of our proposed method over the baselines. SaFair has been successfully deployed in Kuaishou, one of China's most popular short-video streaming platforms with hundreds of millions of active users.
In on-demand delivery,online orders are delivered by couriers from merchants to customers within a short time (e.g., 45 minutes). An important task is to provide an efficient order dispatching solution. Existing studies focus on scenarios with stable routing behavior using pre-determined courier-order matching before delivery while ignoring real-time dynamics during delivery. In this work, we leverage courier-courier encounter events as an opportunity to enable cooperative order dispatching (i.e., conducting order transfers among couriers during delivery) for better delivery efficiency. However, it is non-trivial to conduct encounter-aware cooperative order dispatching in real-time dynamics due to two major challenges: (i) the dynamic nature of encounters in diverse real-world scenarios, and (ii) global delivery efficiency optimization by local order transfers. To address the above challenges, we design a detection-driven cooperative dispatching framework, called DECO. Specifically, we design (i) a Received Signal Strength Indicator (RSSI) variance-based state encoder to model encounter dynamics, (ii) an encounter event selector to choose encounter scenarios, (iii) a time-constrained order mask module to filter unsuitable orders, and (iv) an encounter-aware order transfer scheduler to make detailed order transfer decisions. Extensive experiments on real-world data from two large companies (i.e., JD Logistics, Eleme) show that DECO outperforms other baselines.Real-world deployment results at JD Logistics show that DECO improves the order overdue rate by 4.8%.
To cater to users' desire for an immersive browsing experience, numerous e-commerce platforms provide various recommendation scenarios, with a focus on Trigger-Induced Recommendation (TIR) tasks. However, the majority of current TIR methods heavily rely on the trigger item to understand user intent, lacking a higher-level exploration and exploitation of user intent (e.g., popular items and complementary items), which may result in an overly convergent understanding of users' short-term intent and can be detrimental to users' long-term purchasing experiences. Moreover, users' short-term intent shows uncertainty and is affected by various factors such as browsing context and historical behaviors, which poses challenges to user intent modeling. To address these challenges, we propose a novel model called Deep Uncertainty Intent Network (DUIN), comprising three essential modules: i) Explicit Intent Exploit Module extracting explicit user intent using the contrastive learning paradigm; ii) Latent Intent Explore Module exploring latent user intent by leveraging the multi-view relationships between items; iii) Intent Uncertainty Measurement Module offering a distributional estimation and capturing the uncertainty associated with user intent. Experiments on three real-world datasets demonstrate the superior performance of DUIN compared to existing baselines. Notably, DUIN has been deployed across all TIR scenarios in our e-commerce platform, with online A/B testing results conclusively validating its superiority.
Sustainable development is nowadays a prominent factor for the public. As a result, companies publish their sustainability visions and strategies in various reports to show their commitment to saving the environment and promoting social progress. However, not all statements in these sustainability reports are fact-based. When a company tries to mislead the public with its non-fact-based sustainability claims, greenwashing happens. To combat greenwashing, society needs effective automated approaches to identify the sustainability claims of companies in their heterogeneous reports.
In this paper, we present a new sustainability objective detection system, named GoalSpotter, that automatically identifies the environmental and social claims of companies in their heterogeneous reports. Our system extracts text blocks of diverse reports, preprocesses and labels them using domain expert annotations, and then fine-tunes transformer models on the labeled text blocks. This way, our system can detect sustainability objectives in any new heterogeneous report. As our experiments show, our system outperforms existing state-of-the-art sustainability objective detection approaches. Furthermore, our post-deployment results show the significant impacts of our system in real-world business.
The rise in cyber attacks on cyber-physical critical infrastructures, like water treatment networks, is evidenced by the growing frequency of breaches and the evolving sophistication of attack methods. Attack detection in such vulnerable critical infrastructures can be generalized into a task of anomaly detection with multivariate stream data. There are two essential challenges of this task: 1) Evolving and Shifting data streams; and 2) Robust Attack Pattern representation. Existing anomaly detection approaches, including statistical, distance, density, neural network, and graph-based methods, are not specialized in solving the spurious statistical relationships of evolving distribution shifts in sensing data streams. To address the two challenges, we propose a multi-view causal graph perspective, where 1) We build causal graphs to capture invariant anomaly patterns in varying streams; and 2) Introduce multi-view fusion for robust attack pattern representation. To implement this technical perspective, we develop a fused multi-view causal graph-aware anomaly detection framework. This framework includes two phases: 1) Multi-view Causal Graphs and Spectral Fusion, where we learn the dense view and sparse view causal graphs from sensory data streams and fuse the two causal graphs into a single weighted Laplacian matrix representation. 2) Graph Anomaly Detection, where we train a Deep Convolutional Graph Neural Network (DGCNN) on the Laplacian representation of the "Attack" and "Normal" status graphs to detect attack statuses on sensory data streams per time interval. Our framework achieves a ROC-Score of 82.4% and 93.2% on the SWaT and WADI Water Treatment Network Datasets with an improvement of 9.03% and 16.5% on the f1-score respectively when compared with the best-performing baseline methods on both the datasets.
Automatic extraction of attribute preferences from search queries is a critical problem in providing accurate product recommendations to customer. The task becomes even more challenging in cold-start settings where we do not have any supervised/labelled data available to train ML models. In this work, we implement a novel dataset generation pipeline (LLM-API) that leverages Large Language Models (LLMs), search logs and proprietary product information data from an ecommerce website to create a high quality dataset. Our proposed pipeline of LLM-API is robust as it can generalize to any product category with minimal changes in the LLM prompts. For the problem of converting product search queries to API calls we propose a multi-task schema generator model which we train on our generated dataset. Experiments on an internal test set reveals that our proposed model achieves an improvement of ≈9.6% and ≈5% in Exact Match and Micro-F1 respectively, over competitive baselines. Benchmarking our approach on public test set of search queries further reveals a gain of ≈8.6% and ≈10.5% in Exact Match and Micro-F1. We further demonstrate that our approach outperforms a state-of-the-art LLM (Claude) applied on our task using few-shot prompting and CoT reasoning, while at the same time, achieves improvement in inference latency.
This paper presents a novel approach for predicting Power Conversion Efficiency (PCE) of Organic Photovoltaic (OPV) devices, called GLaD: synergizing molecular Graphs and Language Descriptors for enhanced PCE prediction. Due to the lack of high-quality experimental data, we collect a dataset consisting of 500 pairs of OPV donor and acceptor molecules along with their corresponding PCE values, which we utilize as the training data for our predictive model. In this low-data regime, GLaD leverages properties learned from large language models (LLMs) pretrained on extensive scientific literature to enrich molecular structural representations, allowing for a multimodal representation of molecules. GLaD achieves precise predictions of PCE, thereby facilitating the synthesis of new OPV molecules with improved efficiency. Furthermore, GLaD showcases versatility, as it applies to a range of molecular property prediction tasks (BBBP, BACE, ClinTox and SIDER [45]), not limited to those concerning OPV materials. Especially, GLaD proves valuable for tasks in low-data regimes within the chemical space, as it enriches molecular representations by incorporating molecular property descriptions learned from large-scale pretraining. This capability is significant in real-world scientific endeavors like drug and material discovery, where access to comprehensive data is crucial for informed decision-making and efficient exploration of the chemical space.
In the dynamic landscape of online advertising, decoding user intent remains a pivotal challenge, particularly in the context of query classification. Swift classification models, exemplified by FastText, cater to the demand for real-time responses but encounter limitations in handling intricate queries. Conversely, accuracy-centric models like BERT introduce challenges associated with increased latency. This paper undertakes a nuanced exploration, navigating the delicate balance between efficiency and accuracy. It unveils FastText's latent potential as an 'online dictionary' for historical queries while harnessing the semantic robustness of BERT for novel and complex scenarios. The proposed Distribution-Diverse Multi-Expert (DDME) framework employs multiple teacher models trained from diverse data distributions. Through meticulous data categorization and enrichment, it elevates the classification performance across the query spectrum. Empirical results within the JD ads search system validate the superiority of our proposed approaches.
As data in the telecommunications industry becomes more voluminous and complex, extracting insightful information requires efficient and scalable systems that can effectively link and manage this data. This paper introduces a novel, multi-layered approach to managing interlinked data for Cloud Radio Access Network (CloudRAN) at Ericsson, utilizing Knowledge Graphs (KGs). Our system is structured into six distinct layers, each focusing on a specific aspect of managing interlinked data. This division enhances clarity and manageability, and promotes effective teamwork and collaborative development. A cornerstone of our architecture is its modularity, which enables the flexible exchange of components, such as the triple store, with minimal impact on the system's operations, ensuring longevity and adaptability to evolving technological trends. Moreover, we introduce novel applications in knowledge graph summarization and semantic search, specifically engineered for industrial decision-making. These innovations provide concise insights and actionable intelligence, fostering rapid and informed decision-making processes crucial for industry professionals. Finally, we discuss the lessons learned from deploying and utilizing this six-layer framework.
Understanding causal relationships between machines is crucial for fault diagnosis and optimization in manufacturing processes. Real-world datasets frequently exhibit up to 90% missing data and high dimensionality from hundreds of sensors. These datasets also include domain-specific expert knowledge and chronological order information, reflecting the recording order across different machines, which is pivotal for discerning causal relationships within the manufacturing data. However, previous methods for handling missing data in scenarios akin to real-world conditions have not been able to effectively utilize expert knowledge. Conversely, prior methods that can incorporate expert knowledge struggle with datasets that exhibit missing values. Therefore, we propose COKE to construct causal graphs in manufacturing datasets by leveraging expert knowledge and chronological order among sensors without imputing missing data. Utilizing the characteristics of the recipe, we maximize the use of samples with missing values, derive embeddings from intersections with an initial graph that incorporates expert knowledge and chronological order, and create a sensor ordering graph. The graph-generating process has been optimized by an actor-critic architecture to obtain a final graph that has a maximum reward. Experimental evaluations in diverse settings of sensor quantities and missing proportions demonstrate that our approach compared with the benchmark methods shows an average improvement of 39.9% in the F1-score. Moreover, the F1-score improvement can reach 62.6% when considering the configuration similar to real-world datasets, and 85.0% in real-world semiconductor datasets. The source code is available at https://github.com/OuTingYun/COKE.
High-frequency algorithmic trading has consistently attracted attention in both academic and industrial fields, which is formally modeled as a near real-time sequential decision problem. DRL methods are treated as a promising direction compared with the traditional approaches, as they have shown great potential in chasing maximum accumulative return. However, the financial data gathered from volatile market change rapidly, which makes it dramatically difficult to grasp crucial factors for effective decision-making. Existing works mainly focus on capturing temporal relations while ignoring deriving essential factors across features. Therefore, we propose a DRL-based cross-contextual sequential optimization (CCSO) method for algorithmic trading. In particular, we employ a convolution module in the first stage to derive latent factors via inter-sequence aggregation and apply a well-designed self-attention module in the second stage to capture market dynamics by aggregating temporal intra-sequence details. With the two-stage extractor as encoder and a RNN-based decision-maker as decoder, an Encoder-Decoder module is established as the policy network to conduct potent feature analysis and suggest action plans. Then, we design a dynamic programming based learning method to address the challenge of complex network updates in reinforcement learning, leading to considerable enhancement in learning stability and efficiency. To the best of our knowledge, this is the first work that solves the sequential optimization problem by joint representation of trading data across time and features in the DRL framework. Extensive experiments demonstrate the superior performance of our method compared to other state-of-the-art algorithmic trading approaches in various widely-used metrics.
Recent studies highlight the potential of large language models (LLMs) to enhance content integration in recommender systems by leveraging their semantic understanding capabilities. However, directly incorporating LLMs into an online inference pipeline significantly increases computation costs for large-scale deployment, posing a practical challenge in balancing their benefits and costs. In this work, we propose the EASE framework, which enriches and aligns semantic feature embeddings using LLMs during the training phase while establishing a lightweight inference pipeline that does not directly involve LLMs. Specifically, we train a semantic adapter to align item features with LLMs and simultaneously enrich semantic embeddings through reconstruction tasks from LLMs. During inference, we retain only the item feature encoder and lightweight semantic adapter, thereby eliminating the computation overhead of resource-intensive LLMs. Our EASE framework is flexible, supporting not only text and visual features but also other pre-processed embedding features. Extensive experiments on both public and industrial datasets demonstrate that enriching semantic feature embeddings with our EASE framework yields consistent improvements in downstream click-through rate prediction tasks.
In embedding-based retrieval, Approximate Nearest Neighbor (ANN) search enables efficient retrieval of similar items from large-scale datasets. While maximizing recall of relevant items is usually the goal of retrieval systems, a low precision may lead to a poor search experience. Unlike lexical retrieval, which inherently limits the size of the retrieved set through keyword matching, dense retrieval via ANN search has no natural cutoff. Moreover, the cosine similarity scores of embedding vectors are often optimized via contrastive or ranking losses, which make them difficult to interpret. Consequently, relying on top-K or cosine-similarity cutoff is often insufficient to filter out irrelevant results effectively. This issue is prominent in product search, where the number of relevant products is often small. This paper introduces a novel relevance filtering component (called "Cosine Adapter") for embedding-based retrieval to address this challenge. Our approach maps raw cosine similarity scores to interpretable scores using a query-dependent mapping function. We then apply a global threshold on the mapped scores to filter out irrelevant results. We are able to significantly increase the precision of the retrieved set, at the expense of a small loss of recall. The effectiveness of our approach is demonstrated through experiments on both public MS MARCO dataset and internal Walmart product search data. Furthermore, online A/B testing on the Walmart site validates the practical value of our approach in real-world e-commerce settings.
Skill-based games offer an exceptional avenue for entertainment, fostering self-esteem, relaxation, and social satisfaction. Engagement in online skill gaming platforms is however heavily dependent on the outcomes and experience (e.g., wins/losses). Understanding the factors driving increased engagement is crucial within skill gaming platforms. In this study, we aim to address two key questions: (1) "What factors are driving users to increase their engagement?" and (2) "How can we personalize users journey accordingly to further optimize their engagement?". In skill gaming platforms, the impact of causal relationships often manifests with a delay, which varies significantly as users? personas evolve. Without a detailed information on treatments (such as timing and frequency), estimating the impact of a causal-treatment-effect in a highly volatile game-play data becomes exceedingly challenging. This work proposes a framework called EFfECT-RL that establishes causal discovery by integrating change-point detection and explainable K-means clustering, while leveraging users' game-play and transactional-data. Unlike existing methods which were unable to detect causal-effects in extremely volatile-data, EFfECT-RL generates threshold-trees (~ 79% accuracy) elucidating causal-relationships. Once the causal relationship is established, we personalize treatments by developing a novel offline deep reinforcement learning-based approach. Our online recommendations show a 3% improvement in user engagement (platform-centric) with 70% relevancy (user-centric).
Recommender systems based on Graph Neural Networks (GNN) have become the state-of-the-art approach in recommendation, but they struggle with in extreme cold-start settings, where most users or items lack interaction data. This paper proposes a novel framework to address this challenge in four steps: (i) a propensity model to predict item purchase behaviour, with associated explainability to identify the most relevant features, (ii) a link augmentation module to connect users based on previously obtained similarities, (iii) a GNN-based link prediction step on the obtained dense graph and (iv) a final re-ranking stage to increase diversity in predictions leveraging users embeddings. By exploiting the enriched graph structure, the framework generates embeddings for cold-start users and items, enabling diverse recommendations, containing long tail and unsold items, for both established and new users. We validate the framework's effectiveness on real-world industrial data from TIM S.p.A.
Current metric learning approaches for image retrieval are usually based on learning a space of informative latent representations where simple approaches such as the cosine distance would work well. Recent state of the art methods such as HypViT move to more complex embedding spaces that may yield better results but are harder to scale to production environments. In this work, we first construct a simpler model based on triplet loss with hard negatives mining that performs at state of the art levels but does not have these drawbacks. Second, we introduce a novel approach for image retrieval postprocessing called Siamese Transformer for Image Retrieval (STIR) that reranks several top outputs in a single forward pass. Unlike previously proposed Reranking Transformers, STIR does not rely on global/local feature extraction and directly compares a query image and a retrieved candidate on pixel level via an attention mechanism. The resulting approach defines a new state of the art on standard image retrieval datasets: Stanford Online Products and DeepFashion In-shop. We also release the source code (https://github.com/OML-Team/open-metric-learning/tree/main/pipelines/postprocessing/, a part of the Open Metric Learning library) and an interactive demo (https://dapladoc-oml-postprocessing-demo-srcappmain-pfh2g0.streamlit.app/) of our approach.
Despite the recognized potential of multimodal data to improve model accuracy, many large-scale industrial recommendation systems, including Taobao display advertising system, predominantly depend on sparse ID features in their models. In this work, we explore approaches to leverage multimodal data to enhance the recommendation accuracy. We start from identifying the key challenges in adopting multimodal data in a manner that is both effective and cost-efficient for industrial systems. To address these challenges, we introduce a two-phase framework, including: 1) the pre-training of multimodal representations to capture semantic similarity, and 2) the integration of these representations with existing ID-based models. Furthermore, we detail the architecture of our production system, which is designed to facilitate the deployment of multimodal representations. Since the integration of multimodal representations in mid-2023, we have observed significant performance improvements in Taobao display advertising system. We believe that the insights we have gathered will serve as a valuable resource for practitioners seeking to leverage multimodal data in their systems.
Dense features, customized for different business scenarios, are essential in short video classification. However, their complexity, specific adaptation requirements, and high computational costs make them resource-intensive and less accessible during online inference. Consequently, these dense features are categorized as 'Privileged Dense Features'.Meanwhile, end-to-end multi-modal models have shown promising results in numerous computer vision tasks. In industrial applications, prioritizing end-to-end multi-modal features, can enhance efficiency but often leads to the loss of valuable information from historical privileged dense features.To integrate both features while maintaining efficiency and manageable resource costs, we present Confidence-aware Privileged Feature Distillation (CPFD), which empowers features of an end-to-end multi-modal model by adaptively distilling privileged features during training.Unlike existing privileged feature distillation (PFD) methods, which apply uniform weights to all instances during distillation, potentially causing unstable performance across different business scenarios and a notable performance gap between teacher model (Dense Feature enhanced multimodal-model DF-X-VLM) and student model (multimodal-model only X-VLM), our CPFD leverages confidence scores derived from the teacher model to adaptively mitigate the performance variance with the student model. We conducted extensive offline experiments on five diverse tasks demonstrating that CPFD improves the video classification F1 score by 6.76% compared with end-to-end multimodal-model (X-VLM) and by 2.31% with vanilla PFD on-average. And it reduces the performance gap by 84.6% and achieves results comparable to teacher model DF-X-VLM. The effectiveness of CPFD is further substantiated by online experiments, and our framework has been deployed in production systems for over a dozen models.
Ads supply personalization aims to balance the revenue and user engagement, two long-term objectives in social media ads, by tailoring the ad quantity and density. In the industry-scale system, the challenge for ads supply lies in modeling the counterfactual effects of a conservative supply treatment (e.g., a small density change) over an extended duration. In this paper, we present a streamlined framework for personalized ad supply. This framework optimally utilizes information from data collection policies through the doubly robust learning. Consequently, it significantly improves the accuracy of long-term treatment effect estimates. Additionally, its low-complexity design not only results in computational cost savings compared to existing methods, but also makes it scalable for billion-scale applications. Through both offline experiments and online production tests, the framework consistently demonstrated significant improvements in top-line business metrics over months. The framework has been fully deployed to live traffic in one of the world's largest social media platforms.
In the rapidly evolving field of legal analytics, finding relevant cases and accurately predicting judicial outcomes are challenging because of the complexity of legal language, which often includes specialized terminology, complex syntax, and historical context. Moreover, the subtle distinctions between similar and precedent cases require a deep understanding of legal knowledge. Researchers often conflate these concepts, making it difficult to develop specialized techniques to effectively address these nuanced tasks. In this paper, we introduce the Law Large Language Model (LawLLM), a multi-task model specifically designed for the US legal domain to address these challenges. LawLLM excels at Similar Case Retrieval (SCR), Precedent Case Recommendation (PCR), and Legal Judgment Prediction (LJP). By clearly distinguishing between precedent and similar cases, we provide essential clarity, guiding future research in developing specialized strategies for these tasks. We propose customized data preprocessing techniques for each task that transform raw legal data into a trainable format. Furthermore, we also use techniques such as in-context learning (ICL) and advanced information retrieval methods in LawLLM. The evaluation results demonstrate that LawLLM consistently outperforms existing baselines in both zero-shot and few-shot scenarios, offering unparalleled multi-task capabilities and filling critical gaps in the legal domain. Code and data are available at https://github.com/Tizzzzy/Law_LLM.
In large-scale recommendation systems, modeling long-term user interests is progressively gaining attention among researchers and practitioners. Existing work, such as SIM and TWIN, typically employs a two-stage approach to model long-term user behavior sequences for efficiency concerns. The first stage rapidly retrieves a subset of sequences related to the target item from a long sequence using a search-based mechanism namely the General Search Unit (GSU), while the second stage calculates the interest scores using the Exact Search Unit (ESU) on the retrieved results. Given the extensive length of user behavior sequences spanning the entire life cycle, potentially reaching up to 10^6 in scale, there is currently no effective solution for fully modeling such expansive user interests. To overcome this issue, we introduced TWIN-V2, an enhancement of TWIN, where a divide-and-conquer approach is applied to compress life-cycle behaviors and uncover more accurate and diverse user interests. Specifically, a hierarchical clustering method groups items with similar characteristics in life-cycle behaviors into a single cluster during the offline phase. By limiting the size of clusters, we can compress behavior sequences well beyond the magnitude of 10^5 to a length manageable for online inference in GSU retrieval. Cluster-aware target attention extracts comprehensive and multi-faceted long-term interests of users, thereby making the final recommendation results more accurate and diverse. Extensive offline experiments on a multi-billion-scale industrial dataset and online A/B tests have demonstrated the effectiveness of TWIN-V2. Under an efficient deployment framework, TWIN-V2 has been successfully deployed to the primary traffic that serves hundreds of millions of daily active users at Kuaishou.
Portfolio optimization is a significant concern in finance. Existing research on portfolio optimization fails to adequately learn from the long and short-term relationships among equities, which inevitably leads to suboptimal performance. In this paper, we propose a Dynamic Graph-based Deep Reinforcement Learning (DGDRL) for optimal portfolio decisions. We achieve this goal by devising two mechanisms for naturally modeling the financial market. Firstly, we utilize the static and dynamic graphs to represent the long and short-term relations, which are then naturally represented by the proposed multi-channel graph attention neural network. Secondly, compared with the traditional two-phase approach, forecasting equity's trend and then weighting them by combinatorial optimization, we naturally optimize the portfolio decisions, which could directly guide the model to converge to optimal rewards. Through extensive experiments on three real-world datasets, we have demonstrated that our method significantly outperforms state-of-the-art benchmark methods in portfolio management. Furthermore, the evaluation of the industrial trading system has shown the applicability of our model to real-world financial markets.
In recent years, Approximate Nearest Neighbor Search (ANNS) has played a pivotal role in modern search and recommendation systems, especially in emerging LLM applications like Retrieval-Augmented Generation. There is a growing exploration into harnessing the parallel computing capabilities of GPUs to meet the substantial demands of ANNS. However, existing systems primarily focus on offline scenarios, overlooking the distinct requirements of online applications that necessitate real-time insertion of new vectors. This limitation renders such systems inefficient for real-world scenarios. Moreover, previous architectures struggled to effectively support real-time insertion due to their reliance on serial execution streams. In this paper, we introduce a novel Real-Time Adaptive Multi-Stream GPU ANNS System (RTAMS-GANNS). Our architecture achieves its objectives through three key advancements: 1) We initially examined the real-time insertion mechanisms in existing GPU ANNS systems and discovered their reliance on repetitive copying and memory allocation, which significantly hinders real-time effectiveness on GPUs. As a solution, we introduce a dynamic vector insertion algorithm based on memory blocks, which includes in-place rearrangement. 2) To enable real-time vector insertion in parallel, we introduce a multi-stream parallel execution mode, which differs from existing systems that operate serially within a single stream. Our system utilizes a dynamic resource pool, allowing multiple streams to execute concurrently without additional execution blocking. 3) Through extensive experiments and comparisons, our approach effectively handles varying QPS levels across different datasets, reducing latency by up to 40%-80%. The proposed system has also been deployed in real-world industrial search and recommendation systems, serving hundreds of millions of users daily, and has achieved significant results.
Product recommendations incentivize customers to make multi-unit purchases by surfacing relevant products, leading to lower cost per unit for e-commerce stores and lower prices for their customers. However, the humongous scale of products, implicit co-purchase asymmetry and variation in co-purchase behavior across different categories, are orthogonal problems to solve. To address these problems, we propose MERLIN (Multimodal & Multilingual Embedding for Recommendations at Large-scale via Item associations), a Graph Neural Network that generates product recommendations from a heterogeneous and directed product graph. We mine category associations to remove noisy product co-purchase associations, leading to higher quality recommendations. Leveraging product co-view relationships, we finetune SentenceBERT model for textual representation, and train a self-supervised knowledge distillation model to learn visual representation, which allows us to learn product representations which are multi-lingual and multi-modal in nature. We selectively align node embeddings leveraging co-viewed products. MERLIN model can handle node asymmetry by learning dual embeddings for each product, and can generate recommendations for cold-start products by employing catalog metadata such as title, category and image. Extensive offline experiments on internal and external datasets show that MERLIN model outperforms state-of-the-art baselines for node recommendation and link prediction task. We conduct ablations to quantify the impact of our model components and choices. Further, MERLIN model delivers significant improvement in sales measured through an A/B experiment.
Long-Form Question Answering (LFQA) represents a growing interest in Legal Natural Language Processing (Legal-NLP) as many individuals encounter legal disputes at some point in their lives, but lack of knowledge about how to negotiate these complex situations might put them at risk. The endeavor to generate detailed answers to contextually rich legal questions has faced challenges, primarily due to the limited availability of specialized datasets involving intensive manual effort or incapability of existing LFQA models to produce informative responses. Addressing this, our research introduces a semi-synthetic dataset, Legal-LFQA (L2FQA) created by exploiting a large language model (LLM) and utilizing contexts derived from existing legal datasets. Additionally, we hypothesize that integrating legal reasoning into the answer generation process of the LLMs will help bolster both the quality and interpretability of the produced responses. We systematically analyze the quality of L2FQA using human evaluation and natural language inference based metrics. Next, we benchmark L2FQA on a wide range of general-purpose and domain-specific LLMs using fine-tuning and in-context learning (with zero, one and few shot) strategies. The efficacy of these techniques is gauged through several automated and human evaluations. Results indicate that incorporating legal reasoning into the answer generation process provides an avenue for improving the quality of responses in the context of Legal-LFQA task. By addressing the challenges faced in LFQA and emphasizing the potential of interpretability, this research contributes to the foundational work in enhancing question-answering systems within the legal domain.
Trajectory data is a valuable asset for service management and spatio-temporal mining in transportation and logistics systems. However, due to equipment failure, network delay, and energy constraints, some trajectory point may be missed, which makes it difficult for trajectory-based management. Some researchers have focused on recovering sparse trajectories from road networks and historical trajectory data, but these methods are ineffective when the road network is incomplete. Recent research works have explored learning-based methods to recover trajectories in free space but lack user movement behavior modeling and efficient feature extraction on sparse long-range trajectories. Our work exploits the periodic behavior of couriers and fine-grained Area of Interest (AOI) data for sparse trajectory recovery in last-mile delivery. However, we face challenges with AOI access sequence deviations due to GPS inaccuracies and abnormal courier behaviors, as well as the complex, dynamic relationships within and between courier routes due to uncertain pick-up demands. To address these challenges, we design a graph-based multi-task learning framework, focusing on multi-scale attention fusion for end-to-end free space trajectory recovery. Our approach starts with a behavior-aware graph network that generates detailed spatial features. Following this, we propose a multi-scale attention fusion mechanism to extract intra- and inter-trajectory features. Finally, we design a multi-task learning module that predicts both coarse-grained spatial access sequences and fine-grained trajectory points. We evaluate the model with six-month data involved with more than 360,000 trajectory segments and more than 7.2 million waybills collected from one of the largest logistic companies in China. Extensive experiments on real-world datasets demonstrate that our method outperforms state-of-the-arts in multiple metrics.
We address the real problem of safe, robust, adaptive resource oversubscription in uncertain environments with our proposed novel technique of chance-constrained imitation learning. Our objective is to enhance resource efficiency while ensuring safety against congestion risk. Traditional supervised or forecasting models are ineffective in learning adaptive oversubscription policies, and conventional online optimization or reinforcement learning is difficult to deploy on real systems. Offline policy learning methods, such as Imitation Learning (IL) can leverage historical resource utilization telemetry data to learn effective policies if we can ensure robustness and safety from the underlying uncertainty in the domain, and thus the data. Our work investigates the nature of this uncertainty, how it can be quantified and proposes a novel chance-constrained IL that implicitly models such uncertainty in a principled manner via additional knowledge in the form of stochastic constraints on the associated risk, to learn provably safe and robust policies. We show empirically a substantial improvement (~ 3-4×) in capacity efficiency and congestion safety in test as well as real deployments.
Accurate workload forecasting is critical for efficient resource management in cloud computing systems, enabling effective scheduling and autoscaling. Despite recent advances with transformer-based forecasting models, challenges remain due to the non-stationary, nonlinear characteristics of workload time series and the long-term dependencies. In particular, inconsistent performance between long-term history and near-term forecasts hinders long-range predictions. This paper proposes a novel framework leveraging self-supervised multiscale representation learning to capture both long-term and near-term workload patterns. The long-term history is encoded through multiscale representations while the near-term observations are modeled via temporal flow fusion. These representations of different scales are fused using an attention mechanism and characterized with normalizing flows to handle non-Gaussian/non-linear distributions of time series. Extensive experiments on 9 benchmarks demonstrate superiority over existing methods.
Human behavior prediction is an essential AI-based task, which has inspired many real-world applications. In last-mile logistics, predicting couriers' behavior can benefit the couriers' preference learning and workflow optimization. In this paper, we devote to the behavioral prediction of courier workload and quantify their workload by the working time spent at each area of interest (AOI). Considering the behavior interpretability of inverse reinforcement learning (IRL), existing studies have applied IRL to some real-world transportation prediction scenarios. However, in last-mile logistics, the platform assigns multiple orders to each courier, and couriers also receive new tasks in real-time, which additionally influence the couriers' subsequent decisions. The uncertainty in decision spaces and dynamic the workflow distribution make it more challenging to predict the couriers' working time. In this paper, we propose CourIRL, a practical IRL-based framework leveraging cross-attention to integrate Couriers' historical and spatio-temporal features to predict their future working time. CourIRL formulates the couriers' pick-up and delivery tour as a sequential decision-making process and designs a model-free IRL to learn decision-making preference vectors. A multi-head cross-attention mechanism-based deep regression model is proposed for fine-grained working-time prediction. The results of extensive experiments on two real-world datasets demonstrate that the proposed CourIRL surpasses the state-of-the-art baselines by an average of 6.11% across settings, showing the efficacy and potential contributions of CourIRL in last-mile logistics.
Large language model (LLM) applications in cloud root cause analysis (RCA) have been actively explored recently. However, current methods are still reliant on manual workflow settings and do not unleash LLMs' decision-making and environment interaction capabilities. We present RCAgent, a tool-augmented LLM autonomous agent framework for practical and privacy-aware industrial RCA usage. Running on an internally deployed model rather than GPT families, RCAgent is capable of free-form data collection and comprehensive analysis with tools. Our framework combines a variety of enhancements, including a unique Self-Consistency for action trajectories, and a suite of methods for context management, stabilization, and importing domain knowledge. Our experiments show RCAgent's evident and consistent superiority over ReAct across all aspects of RCA--predicting root causes, solutions, evidence, and responsibilities--and tasks covered or uncovered by current rules, as validated by both automated metrics and human evaluations. Furthermore, RCAgent has already been integrated into the diagnosis and issue discovery workflow of the Real-time Compute Platform for Apache Flink of Alibaba Cloud.
Accurate prediction of Order Fulfillment Cycle Time (OFCT) is essential for improving customer satisfaction and operational efficiency within the domain of on-demand grocery retailing (OGR). OGR platforms typically rely on Front Distribution Centers (FDCs) to manage inventory and deploy dedicated fleets for last-mile delivery to fulfill customer demands. Orders are processed at FDCs initially and then dispatched to delivery fleets. OFCT is influenced by a multitude of factors such as order volume, processing capabilities, delivery capacities, and dispatching strategies. These factors pose significant challenges to refining OFCT prediction accuracy. This paper presents an innovative deep learning model informed by a detailed comprehension of the order fulfillment process, with the objective of significantly enhancing OFCT prediction precision. We employ Recurrent Neural Network (RNN) blocks to dynamically evaluate the workload across processing and delivery stages. To address the interactions among orders and the impact of latent courier dynamics on order prioritization, we incorporate a suite of specialized attention modules into our framework. Our approach further employs Deep Bayesian Multi-Target Learning (DBMTL) to discern the sequential interactions between various stages of order fulfillment, thereby elucidating the influence of earlier stages on subsequent ones. Through online experiments on Meituan-Maicai, one of the biggest OGR platforms in China, our model demonstrates its superiority by outperforming well-acknowledged and advanced baselines. Furthermore, we assess the contributions of specific designs in our model through ablation studies. Our research presents a notable advancement in OFCT prediction, providing valuable insights for OGR platforms seeking to optimize their fulfillment operations and enhance customer experiences.
Urban and peri-urban (UPU) food systems encounter challenges in sustainability and are fragile and vulnerable to shocks. Addressing these issues is one of the key drivers of food sharing initiatives (FSIs) which focus on collective acts around food across the food system. FSIs range from seed sharing and surplus food redistribution to community composting. We describe our development and deployment of web retrieval and content classification tools designed to provide automated mapping of FSIs at scale to populate databases of FSIs within cities. We present our novel automated system tailored for retrieving, identifying, categorizing and real-time monitoring of FSIs in over 200 European cities. Developed within the European CULTIVATE project, this system not only aids in comprehending the complex dynamics of the food sharing economy, but also enhances its visibility and operational efficiency. The automation of these processes plays a vital role in supporting the goals of the CULTIVATE project, notably in promoting sustainable food practices and resilient local food networks. Our system integrates web search using queries constructed automatically using domain-specific vocabulary resources with Large Language Model (LLM) query writing and classification methods. Experimental results using a collection of data derived from real online FSI content underscore the potential of digital automation to make significant contributions to innovative digital solutions to contemporary sustainability challenges. As such, the findings of this work pave the way for future research and implementation in similar contexts.
As an important data resource containing spatial information, addresses record the geospatial information corresponding to social production activities and human behavioral activities. How to effectively encode addresses has always been a core challenge in the field of Geographic Information Systems (GIS). Pre-trained Models (PTMs) designed for Natural Language Process (NLP) have emerged as the dominant tools for encoding semantic information in text. Though promising, those NLP-based PTMs fall short of encoding geographic knowledge in addresses, which limits their application potential in geospatial tasks. To tackle the above problem, this study proposes a Geography-Graph Pre-trained model (G2PTL) that combines graph learning and text pre-training, aiming to make up for the shortcomings of traditional PTM in the geography field. Specifically, we first utilize real-world delivery data to build a large-scale heterogeneous graph of addresses, which contains abundant geographic knowledge and spatial topology information. Then, G2PTL is pre-trained with subgraphs sampled from the heterogeneous graph. Through experimental evaluation on multiple downstream tasks of GIS, including geocoding, geographic entity prediction, and geographic entity recognition, G2PTL demonstrated significant performance improvements. G2PTL has been successfully deployed in production-level GIS, such as Cainiao's logistics system, effectively improving the execution efficiency and accuracy of address-related tasks. This research not only provides a new technical path for the encoding and processing of geographical information, but also opens up a new perspective for the study of pre-training models in the geographical field. The code resources of the G2PTL model have been opened for research and application developers to access and use at https://huggingface.co/Cainiao-AI/G2PTL.
As the last stage of a typical recommendation system, collective recommendation aims to give the final touches to the recommended items and their layout so as to optimize overall objectives such as diversity and whole-page relevance. In practice, however, the interaction dynamics among the recommended items, their visual appearances and meta-data such as specifications are often too complex to be captured by experts' heuristics or simple models. To address this issue, we propose a <u>div</u> ersity-aware self-correcting sequential recommendation <u>net</u> works (DivNet) that is able to estimate utility by capturing the complex interactions among sequential items and diversify recommendations simultaneously. Experiments on both offline and online settings demonstrate that DivNet can achieve better results compared to baselines with or without collective recommendations.
In the rapidly evolving field of e-commerce, the effectiveness of search re-ranking models is crucial for enhancing user experience and driving conversion rates. Despite significant advancements in feature representation and model architecture, the integration of multimodal information remains underexplored. This study addresses this gap by investigating the computation and fusion of textual and visual information in the context of re-ranking. We propose Advancing Re-ranking with Multimodal Fusion and Target-Oriented Auxiliary Tasks (ARMMT), which integrates an attention-based multimodal fusion technique and an auxiliary ranking-aligned task to enhance item representation and improve targeting capabilities. This method not only enriches the understanding of product attributes but also enables more precise and personalized recommendations. Experimental evaluations on JD.com's search platform demonstrate that ARMMT achieves state-of-the-art performance in multimodal information integration, evidenced by a 0.22% increase in the Conversion Rate (CVR), significantly contributing to Gross Merchandise Volume (GMV). This pioneering approach has the potential to revolutionize e-commerce re-ranking, leading to elevated user satisfaction and business growth.
With the rapid development of recommender systems, there is increasing side information that can be employed to improve the recommendation performance. Specially, we focus on the utilization of the associated textual data of items (e.g., product title) and study how text features can be effectively fused with ID features in sequential recommendation. In this paper, we propose a novel <u> Te </u>xt-I<u> D </u> semantic fusion approach for sequential <u> Rec </u>ommendation, namely TedRec. The core idea is to conduct a sequence-level semantic fusion approach by integrating global contexts. The key strategy lies in that we transform the text and ID embeddings by Fourier Transform from time domain to frequency domain. In the frequency domain, the global sequential characteristics of the original sequences are inherently aggregated into the transformed representations, so that we can employ simple multiplicative operations to effectively fuse the two kinds of item features. Our fusion approach can be proved to have the same effects of contextual convolution, so as to achieving sequence-level semantic fusion. Further, we propose to enhance the discriminability of the text embeddings from the text encoder, by adaptively injecting positional information via a mixture-of-experts (MoE) modulation method. Both offline and online experiments demonstrate the effectiveness of our approach.
As the capabilities of smart sensing and mobile technologies continue to evolve and expand, storing diverse sensor data on smartphones and cloud servers becomes increasingly challenging. Effective data compression is crucial to alleviate these storage pressures. Compressed sensing (CS) offers a promising approach, but traditional CS methods often struggle with the unique characteristics of sensor data-like variability, dynamic changes, and different sampling rates-leading to slow processing and poor reconstruction quality. To address these issues, we developed Mob-ISTA-1DNet, an innovative CS framework that integrates deep learning with the iterative shrinkage-thresholding algorithm (ISTA) to adaptively compress and reconstruct smartphone sensor data. This framework is designed to manage the complexities of smartphone sensor data, ensuring high-quality reconstruction across diverse conditions. We developed a mobile application to collect data from 30 volunteers over one month, including accelerometer, gyroscope, barometer, and other sensor measurements. Comparative analysis reveals that Mob-ISTA-1DNet not only enhances reconstruction accuracy but also significantly reduces processing time, consistently outperforming other methods in various scenarios.
Video recommender systems (RSs) have gained increasing attention in recent years. Existing mainstream RSs focus on optimizing the matching function between users and items. However, we noticed that users frequently encounter playback issues such as slow loading or stuttering while browsing the videos, especially in weak network conditions, which will lead to a subpar browsing experience, and may cause users to leave, even when the video content and recommendations are superior. It is quite a serious issue, yet easily overlooked.
To tackle this issue, we propose an on-device Gating and Ranking Framework(GRF) that cooperates with server-side RS. Specifically, we utilize a gate model to identify videos that may have playback issues in real-time, and then we employ a ranking model to select the optimal result from a locally-cached pool to replace the stuttering videos. Our solution has been fully deployed on Kwai, a large-scale short video platform with hundreds of millions of users globally. Moreover, it significantly enhances video playback performance and improves overall user experience and retention rates.
Despite the growing volume of time series data across various domains, detecting anomalies remains challenging due to the complexity and dynamic nature of the data. Traditional monitoring systems are inefficient in capturing contextual and temporal anomalies that are only viable through time and handling high-dimensional data. From implementation to deployment, this paper presents an anomaly detection system on a cyber-physical system by integrating Variational Autoencoders (VAE) with Long Short-Term Memory (LSTM) networks and One-Class Support Vector Machine (OCSVM), forming a hybrid VAE-LSTM-OCSVM model. The proposed architecture positions itself as a zero-day anomaly detector, by learning the nominal functioning of systems, enabling it to identify deviations from normal operations without prior knowledge of specific anomalies. This capability significantly enhances the model's utility in online monitoring, making it adept at detecting unforeseen operational disruptions. We propose an Adaptive Loss Weight Adjustment Algorithm (ALWAA) to account for Domain incremental learning in our system, as required by the ISO/IEC 42001:2023 and ISO/IEC 23053:2022 standards. The model is evaluated on a dataset including 2 types of anomalies, comparing and demonstrating its superiority over existing methods. The findings suggest that the hybrid VAE-LSTM-OCSVM model offers a promising direction for more effective and efficient anomaly detection in time series data, with its ability to safeguard against known and unknown anomalies.
Biomarker discovery is vital in advancing personalized medicine, offering insights into disease diagnosis, prognosis, and therapeutic efficacy. Traditionally, the identification and validation of biomarkers heavily depend on extensive experiments and statistical analyses. These approaches are time-consuming, demand extensive domain expertise, and are constrained by the complexity of biological systems. These limitations motivate us to ask: Can we automatically identify the effective biomarker subset without substantial human efforts? Inspired by the success of generative AI, we think that the intricate knowledge of biomarker identification can be compressed into a continuous embedding space, thus enhancing the search for better biomarkers. Thus, we propose a new biomarker identification framework with two important modules:1) training data preparation and 2) embedding-optimization-generation. The first module uses a multi-agent system to automatically collect pairs of biomarker subsets and their corresponding prediction accuracy as training data. These data establish a strong knowledge base for biomarker identification. The second module employs an encoder-evaluator-decoder learning paradigm to compress the knowledge of the collected data into a continuous space. Then, it utilizes gradient-based search techniques and autoregressive-based reconstruction to efficiently identify the optimal subset of biomarkers. Finally, we conduct extensive experiments on three real-world datasets to show the efficiency, robustness, and effectiveness of our method.
Assigning orders to drivers under localized spatiotemporal context (micro-view order-dispatching) is a major task in Didi, as it influences ride-hailing service experience. Existing industrial solutions mainly follow a two-stage pattern that incorporate heuristic or learning-based algorithms with naive combinatorial methods, tackling the uncertainty of both sides' behaviors, including emerging timings, spatial relationships, and travel duration, etc. In this paper, we propose a one-stage end-to-end reinforcement learning based order-dispatching approach that solves behavior prediction and combinatorial optimization uniformly in a sequential decision-making manner. Specifically, we employ a two-layer Markov Decision Process framework to model this problem, and present Deep Double Scalable Network (D2SN), an encoder-decoder structure network to generate order-driver assignments directly and stop assignments accordingly. Besides, by leveraging contextual dynamics, our approach can adapt to the behavioral patterns for better performance. Extensive experiments on Didi's real-world benchmarks justify that the proposed approach significantly outperforms competitive baselines in optimizing matching efficiency and user experience tasks. In addition, we evaluate the deployment outline and discuss the gains and experiences obtained during the deployment tests from the view of large-scale engineering implementation.
This paper introduces a new structural causal model tailored for representing threshold-based IT systems and presents a new algorithm designed to rapidly detect root causes of anomalies in such systems. When root causes are not causally related, the method is proven to be correct; while an extension is proposed based on the intervention of an agent to relax this assumption. Our algorithm and its agent-based extension leverage causal discovery from offline data and engage in subgraph traversal when encountering new anomalies in online data. Our extensive experiments demonstrate the superior performance of our methods, even when applied to data generated from alternative structural causal models or real IT monitoring data.
Ranking models play an important role in industrial recommendation systems. However, most ranking models are trained only with the observed items but used to retrieve all items in the entire space, which may suffer from the sample selection bias and the exposure bias. Inspired by the entire space learning framework, we carry out detailed data analyses on large-scale unobserved items and find that they contain quite a few "potentially-positive" samples. In this paper, we propose an "Extract and Transfer" (EAT) framework, utilizing quantities of unobserved items and other domains' data to construct more training data for ranking models. Specifically, we first extract "potentially-positive" samples and negative ones according to their ranking scores from the unobserved data, and then design an Entire Space Transfer Learning (ESTL) model to transfer knowledge between observed and unobserved samples, instead of directly mixing them together to avoid negative transfer. Experiments on production data collected from Taobao validate the proposed method's superiority. Besides, we have deployed EAT on the Taobao recommendation system, obtaining 6.22% IPV (Item Page View) and 3.77% CTR improvement. The code is available at https://github.com/Recommender1/EAT.git1.
In industrial recommendation systems on websites and apps, it is essential to recall and predict top-n results relevant to user interests from a content pool of billions within milliseconds. To cope with continuous data growth and improve real-time recommendation performance, we have designed and implemented a high-performance batch query architecture for real-time recommendation systems. Our contributions include optimizing hash structures with a cacheline-aware probing method to enhance coalesced hashing, as well as the implementation of a hybrid storage key-value service built upon it. Our experiments indicate this approach significantly surpasses conventional hash tables in batch query throughput, achieving up to 90% of the query throughput of random memory access when incorporating parallel optimization. The support for NVMe, integrating two-tier storage for hot and cold data, notably reduces resource consumption. Additionally, the system facilitates dynamic updates, automated sharding of attributes and feature embedding tables, and introduces innovative protocols for consistency in batch queries, thereby enhancing the effectiveness of real-time incremental learning updates. This architecture has been deployed and in use in the bilibili recommendation system for over a year, a video content community with hundreds of millions of users, supporting 10x increase in model computation with minimal resource growth, improving outcomes while preserving the system's real-time performance.
Recommendation systems are widely used in e-commerce websites and online platforms to address information overload. However, existing systems primarily rely on historical data and user feedback, making it difficult to capture user intent transitions. Recently, Knowledge Base (KB)-based models are proposed to incorporate expert knowledge, but it struggle to adapt to new items and the evolving e-commerce environment. To address these challenges, we propose a novel Large Language Model based Complementary Knowledge Enhanced Recommendation System (LLM-KERec). It introduces an entity extractor that extracts unified concept terms from item and user information. To provide cost-effective and reliable prior knowledge, entity pairs are generated based on entity popularity and specific strategies. The large language model determines complementary relationships in each entity pair, constructing a complementary knowledge graph. Furthermore, a new complementary recall module and an Entity-Entity-Item (E-E-I) weight decision model refine the scoring of the ranking model using real complementary exposure-click samples. Extensive experiments conducted on three industry datasets demonstrate the significant performance improvement of our model compared to existing approaches. Additionally, detailed analysis shows that LLM-KERec enhances users' enthusiasm for consumption by recommending complementary items. In summary, LLM-KERec addresses the limitations of traditional recommendation systems by incorporating complementary knowledge and utilizing a large language model to capture user intent transitions, adapt to new items, and enhance recommendation efficiency in the evolving e-commerce landscape.
Chronological sequence of user-item interactions is a key feature in recommender systems, as it reveals the transition of users' interests as well as contextual relevance between nearby items. In modern e-commerce applications, various scenarios are usually integrated in one entry page, and the behavior sequence tend to be a combination of user-item interactions across multiple domains, such as on-sale goods, search queries, short videos, livestreams, etc. However, traditional domain-specified recommendations only deal with the interactions within the target domain, which neglects the overall profiles depicted by the behavior across the entire application, leading to overestimation of retargeted items as well as underestimation of unseen ones. So it is crucial to leverage cross-domain data from prominent domains to better supplement user behavior sequences for our targets. To tackle this problem, we propose the Enhanced Cross-domain Ralation Transfer (ECRT) framework to make flexible sequence augmentation with the assist of cross-domain information from other domains. We first employ similarity-based retrieval to obtain relevant sequence information from neighbor domains and build a heterogeneous graph to represent the complex behavior of users. Then we use innovative mining approaches to sample relevant information from the graph to supplement users' behavior sequences, and a hierarchical gated attention structure is used to aggregate these augmented information. We apply our proposed method in the livestream recommendation of Taobao channel pages, and the final experimental results indicate that our method demonstrates excellent performance in both online and offline environments, with an excess of up to 3.6% in main online indicators beyond past SOTA methods.
Dynamic image advertising is an add-on service in search advertising that matches visuals to search ads in real-time. However, the image matching system encompasses various sub-tasks with different objectives, increasing the complexity of achieving global optimization. Besides, prevalent long-tailed data poses a challenge to the multimodal representation learning in dynamic image advertising. Recently, vision-language pre-trained models have achieved remarkable performance across a variety of multimodal tasks, and implemented as the foundational representation model in electronic business scenarios. In this paper, to improve multimodal content understanding in Dynamic Image adVERtising, we present a viSion-language rEpresentation model (referred to as DIVERSE) that learns on cross-view and cross-token contrastive loss. Moreover, with large-scale curated advertising image-text data and extensive efficient training techniques, we scale DIVERSE to 12 billion parameters, which is the biggest Chinese multimodal representation model in industrial practices. Experiment results demonstrate the distinct advantages of DIVERSE12B in business datasets, with competitive performance on public benchmarks. Further evaluation in downstream applications including ad text-image retrieval, text-image relevance modeling, and image content moderation, shows that it outperforms previous separately-trained models across offline and online metrics. Moreover, DIVERSE12B has been implemented on the system primary traffic of Baidu Search Ads, bringing considerable increase to both user experience, and revenue for advertisers and search engine.
Accurately predicting the probabilities of user feedback, such as clicks and conversions, is critical for advertisement ranking and bidding. However, there often exist unwanted mismatches between predicted probabilities and true likelihoods due to the rapid shift of data distributions and intrinsic model biases. Calibration aims to address this issue by post-processing model predictions, and field-aware calibration can adjust model output on different feature field values to satisfy fine-grained advertising demands. Unfortunately, the observed samples corresponding to certain field values can be seriously limited to make confident calibrations, which may yield bias amplification and online disturbance. In this paper, we propose a confidence-aware multi-field calibration method, which adaptively adjusts the calibration intensity based on confidence levels derived from sample statistics. It also utilizes multiple fields for joint model calibration according to their importance to mitigate the impact of data sparsity on a single field. Extensive offline and online experiments show the superiority of our method in boosting advertising performance and reducing prediction deviations.
Logistics platforms provide real-time door-to-door order pickup services to enhance customer convenience. However, a high volume of unexpected order cancellations negatively impacts both customer satisfaction and logistics profitability. Identifying whether these cancellations are due to customers' decisions or couriers' behaviors is crucial for implementing targeted operational improvements. While traditional methods directly interpret customer-courier dialogues, incorporating situational context (e.g., couriers' historical performance and current workloads) helps us to accurately understand the hidden content. The main challenges lie in dynamically correlating couriers' varying behaviors with dialogue content. To tackle this challenge, we develop COCO, a cause identification framework for order cancellation in logistics, which includes: i) Multi-modal features exploration, which analyzes dialogues and couriers' behaviors (both historical and current); ii) Multi-modal features aggregation, which uses a hierarchical attention mechanism to adaptively capture the dynamic correlations within dialogues and behaviors; iii) LLM-enhanced refinement, which leverages Large Language Models to accurately process a large number of unlabeled dialogues, significantly enhancing COCO's generalization and performance. Our extensive evaluation with JD Logistics demonstrates COCO's exceptional performance, achieving an 12.2% increase in precision and a 9.1% improvement in recall over existing methods. Furthermore, after deploying COCO at JD Logistics, it has achieved an accuracy of 89.5%, further demonstrating its practical utility.
Accurate prediction of order transportation time is essential for customer satisfaction in logistics. Existing methods based on origin-destination (OD) pairs do not consider the diversity of road segments, while route-based methods may fail to account for real-time traffic conditions due to the infrequent dispatch schedules of logistics vehicles. In reality, e-commerce platforms have collaborated with multiple logistics companies for parcel delivery, providing a richer dataset that offers a more comprehensive view of real-time transportation conditions. The key insight is that data from one company can serve as internal capability detectors and data from others can act as external environment detectors. However, a significant challenge arises in inferring travel-time-correlated station pairs across different companies, especially without full disclosure of station information. To address this, we design an <u>Ada</u>ptive cross-platform <u>Trans</u>portation time prediction framework built upon a hypergraph structure, named AdaTrans, comprising: i) A spatial-temporal routing graph learner employs node-centric and edge-centric hyperedges to address the complex, non-pairwise correlations among stations and station pairs within and across companies; ii) A spatial-temporal graph-based transportation time predictor that utilizes multi-task learning to enhance overall transportation time prediction by leveraging the correlations between interconnected sub-tasks (i.e., dwell and travel times prediction) Extensive evaluation with real-world data collected from JD.com, a leading e-commerce platform in China, demonstrates that consolidating records from other companies reduces RMSE, MAE, and MAPE by 12.63%, 5.18%, and 16.67%, compared to state-of-the-art methods.
Stock price prediction is a challenging problem in the field of finance and receives widespread attention. In recent years, with the rapid development of technologies such as deep learning and graph neural networks, more research methods have begun to focus on exploring the interrelationships between stocks. However, existing methods mostly focus on the short-term dynamic relationships of stocks and directly integrating relationship information with temporal information. They often overlook the complex nonlinear dynamic characteristics and potential higher-order interaction relationships among stocks in the stock market. Therefore, we propose a stock price trend prediction model named LSR-IGRU in this paper, which is based on long short-term stock relationships and an improved GRU input. Firstly, we construct a long short-term relationship matrix between stocks, where secondary industry information is employed for the first time to capture long-term relationships of stocks, and overnight price information is utilized to establish short-term relationships. Next, we improve the inputs of the GRU model at each step, enabling the model to more effectively integrate temporal information and long short-term relationship information, thereby significantly improving the accuracy of predicting stock trend changes. Finally, through extensive experiments on multiple datasets from stock markets in China and the United States, we validate the superiority of the proposed LSR-IGRU model over the current state-of-the-art baseline models. We also apply the proposed model to the algorithmic trading system of a financial company, achieving significantly higher cumulative portfolio returns compared to other baseline methods. Our sources are released at https://github.com/ZP1481616577/Baselines\_LSR-IGRU.
Job marketplace is a heterogeneous graph composed of interactions among members (job-seekers), companies, and jobs. Understanding and modeling job marketplace can benefit both job seekers and employers, ultimately contributing to the greater good of the society. However, existing graph neural network (GNN)-based methods have shallow understandings of the associated textual features and heterogeneous relations. To address the above challenges, we propose PLM4Job, a job marketplace foundation model that tightly couples pretrained language models (PLM) with job market graph, aiming to fully utilize the pretrained knowledge and reasoning ability to model member/job textual features as well as various member-job relations simultaneously. In the pretraining phase, we propose a heterogeneous ego-graph-based prompting strategy to model and aggregate member/job textual features based on the topological structure around the target member/job node, where entity type embeddings and graph positional embeddings are introduced accordingly to model different entities and their heterogeneous relations. Meanwhile, a proximity-aware attention alignment strategy is designed to dynamically adjust the attention of the PLM on ego-graph node tokens in the prompt, such that the attention can be better aligned with job marketplace semantics. Extensive experiments at LinkedIn demonstrate the effectiveness of PLM4Job.
Online food delivery (OFD) services, known for offering varied meals at home, have gained global popularity. Meituan has recently ventured into the affordable market segment with its "Pinhaofan'' service, highlighting the imperative to delivery efficiency. To achieve this, delivery scope is regarded as one of the most effective operational tools. The delivery scope of a merchant refers to the geo-graphical area where they can serve customers. Current methods for generating delivery scopes primarily focus on optimizing a single merchant's efficiency or rely on manual delineated from the merchant's perspective, neglecting the merchant substitution effect and potentially resulting in order loss. In this paper, we propose a novel method, named Collaborative Scope, which views the delivery scope as an assortment optimization problem, considering the substitution effect between merchants from the user's perspective. We introduce the discrete choice model of econometrics and use the Enhanced Multinomial Logit Model to predict user conversion rates in the merchant list. Next, we formulate the delivery scope optimization problem of multiple merchants as a mixed integer programming problem. The city-wide solution of this problem, owing to the large-scale combinatorial optimization triggered by high-dimensional decision variables, incurs high computational complexity. To address this, we propose an approximate solution to the original problem through a first-order Taylor series approximation, which significantly reduces the computation complexity at the expense of a slight decrease in solution accuracy. Offline and online A/B test results indicate that, compared to existing methods, Collaborative Scope significantly improves delivery efficiency by reducing delivery difficulty without hurt of order volume. Notably, Collaborative Scope is currently deployed on "Pinhaofan'', serving tens of millions of online users.
As education adopts digital platforms, the vast amount of information from various sources, such as learning management systems and learning object repositories, presents challenges in navigation and elaboration. Traditional interfaces involve a steep learning curve, limited user accessibility, and lack flexibility. Language models alone cannot address these issues as they do not have access to structured information specific to the educational organization. In this paper, we propose EDGE (EDucational knowledge Graph Explorer), a natural language interface that uses knowledge graphs to organize educational information. EDGE translates natural language requests into queries and converts the results back into natural language responses. We show EDGE's versatility using knowledge graphs built from public datasets, providing example interactions of different stakeholders. Demo video: https://u.garr.it/eYq63.
Despite the powerful capabilities of GNN-based drug screening model in predicting target drug properties, the black-box nature of these models poses a challenge for practical application, particularly in a field as critical as drug development where understanding and trust in AI-driven decisions are important. To address the interpretability issues associated with GNN-based virtual drug screening, we introduce XplainScreen: a unified explanation framework designed to evaluate various explanation methods for GNN-based models. XplainScreen offers a user-friendly, web-based interactive platform that allows for the selection of specific GNN-based drug screening models and multiple cutting-edge explainable AI methods. It supports both qualitative assessments (through visualization and generative text descriptions) and quantitative evaluations of these methods, utilizing drug molecules in SMILES format. This demonstration showcases the utility of XplainScreen through a user study with pharmacological researchers focused on virtual screening tasks based on toxicity, highlighting the framework's potential to enhance the integrity and trustworthiness of AI-driven virtual drug screening. A video demo of XplainScreen is available at https://youtu.be/Q4yobrTLKec, and the source code can be accessed at https://github.com/GeonHeeAhn/XplainScreen.
This paper presents RevEx, an online consumer reviews extraction tool. RevEx extracts the comments section for products in webshops. In contrast to other web scraping tools, it can work with heterogeneous web pages automatically, that is, it does not need any additional information apart from the web page itself. In addition, RevEx is a page-level tool since it only needs to load the web page whose comments have to be extracted. The technique includes a mechanism to group similar DOM nodes and then, once several sets of similar DOM nodes are obtained, an algorithm selects the group of DOM nodes that corresponds to the comments of the web page. The results of the empirical evaluation show an average F1 higher than 88%, and perfect results for around 75% of web pages.
As Large Language Models (LLMs) are integrated into various sectors, ensuring their reliability and safety is crucial. This necessitates rigorous probing and auditing to maintain their effectiveness and trustworthiness in practical applications. Subjecting LLMs to varied iterations of a single query can unveil potential inconsistencies in their knowledge base or functional capacity. However, a tool for performing such audits with a easy to execute workflow, and low technical threshold is lacking. In this demo, we introduce "AuditLLM," a novel tool designed to audit the performance of various LLMs in a methodical way. AuditLLM's primary function is to audit a given LLM by deploying multiple probes derived from a single question, thus detecting any inconsistencies in the model's comprehension or performance. A robust, reliable, and consistent LLM is expected to generate semantically similar responses to variably phrased versions of the same question. Building on this premise, AuditLLM generates easily interpretable results that reflect the LLM's consistency based on a single input question provided by the user. A certain level of inconsistency has been shown to be an indicator of potential bias, hallucinations, and other issues. One could then use the output of AuditLLM to further investigate issues with the aforementioned LLM. To facilitate demonstration and practical uses, AuditLLM offers two key modes: (1) Live mode which allows instant auditing of LLMs by analyzing responses to real-time queries; and (2) Batch mode which facilitates comprehensive LLM auditing by processing multiple queries at once for in-depth analysis. This tool is beneficial for both researchers and general users, as it enhances our understanding of LLMs' capabilities in generating responses, using a standardized auditing platform.
Photo restoration technology enables preserving visual memories in photographs. However, physical prints are vulnerable to various forms of deterioration, ranging from physical damage to loss of image quality, etc. While restoration by human experts can improve the quality of outcomes, it often comes at a high price in terms of cost and time for restoration. In this work, we present the AI-based photo restoration framework composed of multiple stages, where each stage is tailored to enhance and restore specific types of photo damage, accelerating and automating the photo restoration process. By integrating these techniques into a unified architecture, our framework aims to offer a one-stop solution for restoring old and deteriorated photographs. Furthermore, we present a novel old photo restoration dataset because we lack a publicly available dataset for our evaluation.
Fact-checkers are overwhelmed by the volume of claims they need to pay attention to fight misinformation. Even once debunked, a claim may still be spread by people unaware that it is false, or it may be recycled as a source of inspiration by malicious users. Hence, the importance of fact-check (FC) retrieval as a research problem: given a claim and a database of previous checks, find the checks relevant to the claim. Existing solutions addressing this problem rely on the strategy of retrieve and re-rank relevant documents. We have built FactCheckBureau, an end-to-end solution that enables researchers to easily and interactively design and evaluate FC retrieval pipelines. We also present a corpus we have built, which can be used in further research to test fact-check retrieval tools. The source code of our tool is available at this link.
High-quality data is essential for informed public debate. High-quality statistical data sources provide valuable reference information for verifying claims. To assist journalists and fact-checkers, user queries about specific claims should be automatically answered using statistical tables. However, the large number and variety of these sources make this task challenging.
We propose to demonstrate STaR, a novel method for Space and Time-aware STatistic Retrieval, based on a user natural language query. STaR is deployed within our system StatCheck, which we developed and shared with fact-checking journalists. STaR improves the quality of statistic fact retrieval by treating space and time separately from the other parts of the statistics dataset. Specifically, we use them as dimensions of the data (and the query), and focus the linguistic part of our dataset search on the rich, varied language present in the data. Our demonstration uses statistic datasets from France, Europe, and a few beyond, allowing users to query and explore along space and time dimensions.
We present FairRankTune, a multi-purpose open-source Python toolkit offering three primary services: quantifying fairness-related harms, leveraging bias mitigation algorithms, and constructing custom fairness-relevant datasets. FairRankTune provides researchers and practitioners with a self-contained resource for fairness auditing, experimentation, and advancing research. The central piece of FairRankTune is a novel fairness-tunable ranked data generator, RankTune, that streamlines the creation of custom fairness-relevant ranked datasets. FairRankTune also offers numerous fair ranking metrics and fairness-aware ranking algorithms within the same plug-and-play package. We demonstrate the key innovations of FairRankTune, focusing on features that are valuable to stakeholders via use cases highlighting workflows in the end-to-end process of mitigating bias in ranking systems. FairRankTune addresses the gap of limited publicly available datasets, auditing tools, and implementations for fair ranking.
In today's music industry, album cover design is as crucial as the music itself, reflecting the artist's vision and brand. However, many AI-driven album cover services require subscriptions or technical expertise, limiting accessibility. To address these challenges, we developed Music2P, an open-source, multi-modal AI-driven tool that streamlines album cover creation, making it efficient, accessible, and cost-effective through Ngrok. Music2P automates the design process using techniques such as Bootstrapping Language Image Pre-training (BLIP), music-to-text conversion (LP-music-caps), image segmentation (LoRA), and album cover and QR code generation (ControlNet). This paper demonstrates the Music2P interface, details our application of these technologies, and outlines future improvements. Our ultimate goal is to provide a tool that empowers musicians and producers, especially those with limited resources or expertise, to create compelling album covers.
Heatwaves pose significant health risks, particularly due to prolonged exposure to high summer temperatures. The large vulnerable groups, especially pedestrians and cyclists on sun-exposed sidewalks, motivate the development of a route planning method that incorporates somatosensory temperature effects through shade ratio consideration. This paper is the first to introduce a pipeline that utilizes segmentation foundation models to extract shaded areas from high-resolution satellite images. These areas are then integrated into a multi-layered road map, enabling users to customize routes based on a balance between distance and shade exposure, thereby enhancing comfort and health during outdoor activities. Specifically, we construct a graph-based representation of the road map, where links indicate connectivity and are updated with shade ratio data for dynamic route planning.
Understanding the skills and proficiency levels required for various roles is crucial for effective workforce planning, learning and development. In this paper, we propose a robust skill proficiency modeling framework that offers a structured method to help describe, assess and develop proficiency in key skills, facilitating individuals' career pathways and aiding organizations in talent management and adaptability. We first design a skill proficiency description pipeline, which generates statements describing the requirements at each proficiency level of a skill. Following this, we build a skill proficiency by occupation model using large-scale job ad data to help organizations and individuals understand the skill proficiency requirements for different roles. Finally,we design a visual analytics system, based on a real-world career pathway scenario, to demonstrate the practical usefulness and effectiveness of our framework. A demo video is available at www.dropbox.com/scl/fi/nd0f3vi03n12g4y0sluaw/cikm24_demo.mp4?rlkey=55vya144q5ftai1uqqaubr5u5.
In recent years, researchers have developed several methods to automate discovering datasets and augmenting features for training Machine Learning (ML) models. Together with feature selection, these efforts have paved the way towards what is termed the feature discovery process. Data scientists and engineers use automated feature discovery over tabular datasets to add new features from different sources and enrich training data. By surveying data practitioners, we have observed that automated feature discovery approaches do not allow data scientists to use their domain knowledge during the feature discovery process. In addition, automated feature discovery methods can leak private features or introduce biased ones.
In this paper, we introduce the first user-driven human-in-the-loop feature discovery method called HILAutoFeat. We demonstrate the capabilities of HILAutoFeat, which effectively combines automated feature discovery with user-driven insights. Our demonstration is centred around two scenarios: (i) an automated feature discovery scenario -- HILAutoFeat acts as a steward in a large data lake where the user is unaware of the quality and relevance of the data, and (ii) a scenario where HILAutoFeat and the user work together -- the user drives the feature discovery process by adding his domain and business knowledge, while HILAutoFeat performs the intensive computations.
While witnessing the exceptional success of machine learning (ML) technologies in many applications, users are starting to notice a critical shortcoming of ML: correlation is a poor substitute for causation. The conventional way to discover causal relationships is to use randomized controlled experiments (RCT); in many situations, however, these are impractical or sometimes unethical. Causal learning from observational data offers a promising alternative. While being relatively recent, causal learning aims to go far beyond conventional machine learning, yet several major challenges remain. Unfortunately, advances are hampered due to the lack of unified benchmark datasets, algorithms, metrics, and evaluation service interfaces for causal learning. In this paper, we introduce CausalBench, a transparent, fair, and easy-to-use evaluation platform, aiming to (a) enable the advancement of research in causal learning by facilitating scientific collaboration in novel algorithms, datasets, and metrics and (b) promote scientific objectivity, reproducibility, fairness, and awareness of bias in causal learning research. CausalBench provides services for benchmarking data, algorithms, models, and metrics, impacting the needs of a broad of scientific and engineering disciplines.
Mining dense subgraphs from a big graph is important in applications such as community (or module) detection in social (or biological) networks. While most dense structures are defined on undirected graphs, recent efforts have generalized these notions to directed graphs. In this demonstration paper, we present DirDense, an interactive tool that makes it easy for end-users to mine dense structures from a big directed graph. DirDense currently supports the mining of maximal (γ1, γ2)-quasi-cliques, maximal (k 1,k 1)-plexes, and the directed densest subgraph. DirDense facilitates parameter tuning for each type of the structure-mining tasks, and provides intuitive interfaces to visualize and examine the dense directed structures. Using real-world data, we showcase how users can mine dense directed structures by parameter tuning in DirDense, and how they can conveniently examine these structures and cascade the mining tasks to find progressively larger dense subgraphs more quickly.
Antimicrobial resistance (AMR) poses potentially critical health issues for human and animal populations in the near future. To meet this challenge, we need to adopt a "One Health" strategy, which involves studying and linking information from human and animal populations, as well as from the environment.
In this demonstration, we present an early prototype of PROMISE platform, which we are developing for One Health data management and analytics, to enable experts from different fields to gain insights into AMR. It is designed to handle data from 25 academic networks and 42 partners. Our demonstration illustrate the capabilities of our methodology for analyzing these data. The user is freed from considerations related to data heterogeneity, as interoperability issues are managed by the platform. Additionally, each data provider will be able to stay within his/her own vocabulary, whatever the taxonomy used by other data providers.
We present Dialogue-based Knowledge-oriented Programming system (DiaKoP), a system with a chat interface designed for multi-turn knowledge base question answering (KBQA). DiaKoP enables users to decompose complex questions into multiple simpler follow-up questions and interact with the system to obtain answers. Multi-turn KBQA presents unique challenges because users may switch topics or ask incomplete questions that rely on previous interactions. To address this, we develop a Dialogue History Tracker and Dialogue Policy to manage user conversations effectively. Additionally, we enhance the knowledge from the knowledge graph by integrating parametric knowledge from a large language model (LLM) to provide more comprehensive answers. To mitigate the issue of wrongly parsed questions by semantic parser, we implement a human-in-the-loop mechanism, allowing users to correct errors. We evaluate DiaKoP both qualitatively and quantitatively, with user study indicating that our system better meets users' needs. DiaKoP is open-sourced on https://github.com/THU-KEG/DiaKoP with a guiding demo on https://youtu.be/Tq17k0OxPVg.
The advent of Large Language Models (LLMs) provides an opportunity to change the way queries are processed, moving beyond the constraints of conventional SQL-based database systems. However, using an LLM to answer a prediction query is still challenging, since an external ML model has to be employed and inference has to be performed in order to provide an answer. This paper introduces LLM-PQA, a novel tool that addresses prediction queries formulated in natural language. LLM-PQA is the first to combine the capabilities of LLMs and retrieval-augmented mechanism for the needs of prediction queries by integrating data lakes and model zoos. This integration provides users with access to a vast spectrum of heterogeneous data and diverse ML models, facilitating dynamic prediction query answering. In addition, LLM-PQA can dynamically train models on demand, based on specific query requirements, ensuring reliable and relevant results even when no pre-trained model in a model zoo, available for the task.
Traditional diagnosis of chronic diseases involves in-person consultations with physicians to identify the disease. However, there is a lack of research focused on predicting and developing application systems using clinical notes and blood test values. We collected five years of Electronic Health Records (EHRs) from Taiwan's hospital database between 2017 and 2021 as an AI database. Furthermore, we developed an EHR-based chronic disease prediction platform utilizing Large Language Multimodal Models (LLMMs), successfully integrating with frontend web and mobile applications for prediction. This prediction platform can also connect to the hospital's backend database, providing physicians with real-time risk assessment diagnostics. The demonstration link can be found at https://www.youtube.com/watch?v=oqmL9DEDFgA
Vertical partitioning is a crucial physical design strategy in databases that enhances data management and retrieval through optimal data placement. However, current research often overlooks the use of query predicates for effective data block allocation, resulting in potential performance bottlenecks. Moreover, selecting an appropriate partitioning technique based solely on historical experimental results from research articles is challenging due to variability in storage devices, evaluation metrics, and database schemas. We propose PARS to address these issues by offering end-to-end input/output, customizable database configurations, and prioritized optimization objectives to aid database administrators (DBAs) in making informed partitioning decisions. Additionally, PARS introduces a novel algorithm that leverages both numeric and non-numeric query predicates to partition the tablespace into finer data blocks, reducing query latency by 36.1% when benchmarked against the state-of-the-art (SOTA) method.
In recent years, many studies successfully integrated transfer learning techniques to improve the performance of Bayesian optimization. However, these advanced methods have not been widely adopted in real-world applications due to their inherent complexity and challenges in re-implementation and reproducibility. In this work, we introduce OpenTOS, an open-source system designed for transfer learning in Bayesian optimization. OpenTOS introduces a new implementation paradigm for these methods, allowing users to build different algorithms by choosing algorithmic components, similar to assembling LEGO blocks. Additionally, OpenTOS provides robust data management for supporting transfer learning with data from various sources. We also developed a web interface that allows for interactive building, analysis, and visualization of the optimization process. Powered by LLM, this interface offers a conversational experience, allowing users to interact with the system through natural language dialogue. OpenTOS is available as open-source on https://github.com/COLA-Laboratory/TransOPTGitHub .
High-quality data is essential for data science and machine learning applications, but unfortunately, real-world data often contains significant amounts of errors, such as typos, missing values, and data inconsistencies. Despite all the efforts in cleaning data using either logical or learning-based methods, in practice, data cleaning still requires high human cost, for either manually providing data repairing rules or preparing labeled datasets for training machine learning models. In this paper, we introduce GARF, a novel data cleaning system based on sequence generative adversarial networks (SeqGAN). One key information GARF tries to learn is data repair rules. To automatically extracts data repair rules from dirty data, GARF employs a SeqGAN to capture the dependency relationships, and converts the information learned by machine to interpretable data repair rules for humans. Additionally, considering that both generated rules and data may not be fully trusted, GARF provides a co-cleaning process to iteratively update inaccurate rules and repair dirty data until there is no tuple violating rules. We have implemented and deployed GARF as an open-sourced system, and demonstrated its usability on data cleaning in real-world scenarios.
The rapid growth in the number of news articles published daily can create challenges for users to explore specific topics and gather different perspectives around the topics to make neutral and unbiased conclusions. The system's ability to intelligently cluster news articles from multiple sources and retrieve concise (pro/con) relevant arguments is necessary for users' well-informed decision-making. In this paper, we introduce our unified argument retrieval system that uses our clustering model to cluster news articles and subsequently extracts the core arguments from news articles using the argument prediction model. We conducted a user study to understand the system's usability and users' satisfaction with the quality of clusters and arguments extracted.
Advancements in sensor technology are leading to massive collection of tracking data in sports. There is an increasing interest in analyzing the tracking data to gain competitive advantage. Analyzing and labeling key game moments can provide deep insights into player performance, team dynamics, as well as game strategy. However, the process of manually labeling and analyzing these moments is costly and time-consuming. In this paper, we describe a visual interface for user-friendly and efficient labeling of key moments in basketball games aided by neural networks. We report results of a user study evaluating the labeling interface.
Variant calling is a fundamental task that involves identifying variants in an individual's genome compared to the reference genome. Knowing these variants is critical for assessing an individual's risk for diseases such as cancer and developing new treatments. Due to the large size of human genome sequences, processing and analyzing them requires significant compute and storage resources. Cluster computing is an attractive solution for processing a large workload of human genomes. In this paper, we present a scalable tool for democratizing variant calling on human genome sequences using testbeds that are available for academic research at no charge. Our tool can (a) execute two types of variant calling pipelines in a commodity cluster with CPUs and graphics processing units (GPUs); (b) enable improved cluster utilization and faster execution via asynchronous computations, minimal synchronization, and mutual exclusion when employing GPUs; and (c) execute variant calling pipelines of multiple users concurrently. Using publicly available human genome sequences, users can interactively experience the unique features of our tool, which has a low barrier to entry for large-scale variant calling.
The Text-to-SQL problem aims at developing natural language query interfaces for relational database systems by converting the text input into executable SQL queries. Recently, using Large Language Models (LLM) has emerged as a new paradigm for the Text-to-SQL problem. To this end, the LLM needs to understand not only user input but also information from the database. In this demo, we present multi-agent SQL (MageSQL), an LLM based Text-to-SQL approach that tackles the task by orchestrating multiple agents in a pipeline. We will showcase a user-friendly interface to demonstrate the inner workings of our approach that allows users to add and modify the agents with different functionalities, customize prompts, and see their impact on specific examples. Through several use cases, we will demonstrate how to (i) construct a Text-to-SQL pipeline with multiple agents; (ii) generate prompts for LLM with various templates and strategies; and (iii) monitor the results of natural language queries and perform debugging.
In recent years, multi-graphs have been gaining increasing popularity due to their ability to better capture the multi-faceted information of real-world graphs. This, in turn, has consistently provided superior insights and performance in related machine learning tasks. However, the analysis of real-world multi-graphs and the development of multi-graph methods is currently stifled by a few limitations. On one side, researchers often struggle to properly evaluate the performance of multi-graph methods they design due to a lack of high-quality benchmarks, but also a lack of tools that allow for efficient and seamless experimentation. On the other side, practitioners aiming to analyze real-world multi-graphs often struggle obtaining robust insights due to a lack of high-quality multi-graph methods. To this end, we present Multi-Graph Explorer: a MATLAB software designed to offer a user-friendly yet comprehensive, flexible, and extensible workflow for multi-graph analysis, aiming to break these barriers and accelerate progress in machine learning tasks involving multi-graphs.
We present LINKin-PARK, an innovative system that seamlessly merges geographic visualization with an advanced Dual Attention Double Channel Convolutional Neural Network with Multilayer Perceptron (Dual Attention-DCCNN+MLP) to facilitate the efficient analysis of land valuation. LINKin-PARK provides robust visualization capabilities for intuitive comprehension. Our model outperforms traditional methods, e.g., linear regression, multilayer perceptron (MLP), Extreme Gradient Boosting (XGBoost), and the combination of CNN (Convolutional Neural Network) with MLP. An ablation study further evaluates the influence of specific components within the model, revealing that spatial and channel-wise attention mechanisms and the integration of DCCNN and skip connections are crucial for capturing spatial details and improving prediction accuracy. Users have the flexibility to explore and predict developable land valuation based on their specific requirements and provide their feedback to minimize errors in model prediction. For instance, this system can forecast future development potential and market demand for everywhere in an urban space, enabling users to make informed decisions before purchasing a property. Similarly, retailers can anticipate future revenues to aid in strategic decisions, such as selecting optimal locations for establishing new retail outlets. In summary, LINKin-PARK effectively combines geographic visualization and Dual Attention-DCCNN+MLP to assist users in analyzing and predicting land valuation and other scenarios.
We present Event-focused Search, an automated and scalable pipeline designed to facilitate event discovery and enhance event-based search. This is done by leveraging large language models (LLMs) to populate event datasets, perform temporal search based on selected dates, and aggregate search results based on appropriate events based on those searches. We illustrate this pipeline through proof-of-concept interfaces in an e-commerce context, though such a framework is applicable to different types of search scenarios (e.g., sports, entertainment).
Traditional legal retrieval systems designed to retrieve legal documents, statutes, precedents, and other legal information are unable to give satisfactory answers due to lack of semantic understanding of specific questions. Large Language Models (LLMs) have achieved excellent results in a variety of natural language processing tasks, which inspired us that we train a LLM in the legal domain to help legal retrieval. However, in the Chinese legal domain, due to the complexity of legal questions and the rigour of legal articles, there is no legal large model with satisfactory practical application yet. In this paper, we present DeliLaw, a Chinese legal counselling system based on a large language model. DeliLaw integrates a legal retrieval module and a case retrieval module to overcome the model hallucination. Users can consult professional legal questions, search for legal articles and relevant judgement cases, etc. on the DeliLaw system in a dialogue mode. In addition, DeliLaw supports the use of English for counseling. we provide the address of the system: https://data.delilegal.com/lawQuestion.
myCADI is a machine learning framework associated with a graphical interface for discovering and understanding the internal structure of an unsupervised dataset. It is an intuitive end-user interface to the CADI approach, which uses a revised version of the Isolation Forest method to both 1) identify local anomalies, 2) reconstruct the cluster-based internal structure of the data, and 3) provide end-users with explanations of how anomalies deviate from the found clusters. myCADI takes numerical data as input and is structured around several interfaces, each of which displays a ranked list of the found anomalies, a description of the subspaces in which the different clusters lie, and feature attribution explanations to ease the interpretation of anomalies. These explanations make explicit why a selected point is considered to be a local anomaly of one (or more) cluster(s). The framework also provides dataset and trees visualizations.
Parameter-Efficient Fine-Tuning (PEFT) adapts large language models (LLMs) to specific domains by updating only a small portion of the parameters. To easily and efficiently adapt LLMs to custom domains, we present a no-code fine-tuning platform, GongBu, supporting 9 PEFT methods and open-source LLMs. GongBu allows LLM fine-tuning through a user-friendly GUI, eliminating the need to write any code. Its features include data selection, accelerated training speed, decoupled deployment, performance monitoring, and error log analysis. The demonstration video is available at https://www.youtube.com/watch?v=QuDR_WNoB9o.
This paper introduces Mastodoner, a command-line tool and Python library aimed at simplifying access to public data on Mastodon, a prominent player in the Fediverse --- a decentralized network of interconnected social media platforms. Mastodoner addresses the challenges posed by Mastodon's decentralized nature by providing a unified interface for data collection, instance discovery, and secure data sharing. Through examples and demonstrations, this paper illustrates Mastodoner's capabilities in facilitating researchers' access to and analysis of public Mastodon data, thus advancing research in decentralized social media analytics. The tool and documentation are available at: https://github.com/harisbinzia/mastodoner.
Poor data quality significantly affects different data analytics tasks, leading to inaccurate decisions and poor predictions of the machine learning models. Outliers represent one of the most common data glitches that impact data quality. While detecting outliers in numerical data has been extensively studied, few attempts were made to solve the problem of detecting categorical outliers. In this paper, we introduce DetCat for detecting categorical outliers in relational datasets, by utilizing the syntactic structure of the values. For a given attribute, DetCat identifies a set of patterns that represents the majority of the values as dominating patterns. Data values that cannot be generated by the dominating patterns are declared as outliers. The demo will show the effectiveness of our tool in detecting categorical outliers and discovering the syntactical data patterns.
As information retrieval systems continue to evolve, accurate evaluation and benchmarking of these systems become pivotal. Web search datasets, such as MS MARCO, primarily provide short keyword queries without accompanying intent or descriptions, posing a challenge in comprehending the underlying information need. This paper proposes an approach to augmenting such datasets to annotate informative query descriptions, with a focus on two prominent benchmark datasets: TREC-DL-21 and TREC-DL-22. Our methodology involves utilizing state-of-the-art LLMs to analyze and comprehend the implicit intent within individual queries from benchmark datasets. By extracting key semantic elements, we construct detailed and contextually rich descriptions for these queries. To validate the generated query descriptions, we employ crowdsourcing as a reliable means of obtaining diverse human perspectives on the accuracy and informativeness of the descriptions. This information can be used as an evaluation set for tasks such as ranking, query rewriting, or others.
We present 3DLNews, a novel dataset with local news articles from the United States spanning the period from 1996 to 2024. It contains almost 1 million URLs (with HTML text) from over 14,000 local newspapers, TV, and radio stations across all 50 states, and provides a broad snapshot of the US local news landscape. The dataset was collected by scraping Google and Twitter search results. We employed a multi-step filtering process to remove non-news article links and enriched the dataset with metadata such as the names and geo-coordinates of the source news media organizations, article publication dates, etc. Furthermore, we demonstrated the utility of 3DLNews by outlining four applications.
Accurately judging student on-going performance is crucial for adaptive teaching. In this work, we focus on the task of automatically predicting students' levels of mastery of math questions from teacher-student classroom dialogue data in online one-on-one classes. As a step toward this direction, we introduce the Multi-turn Classroom Dialogue (MCD) dataset as a benchmark testing the capabilities of machine learning models in classroom conversation understanding of student performance judgment. Our dataset contains aligned multi-turn spoken language of 5000+ unique samples of solving grade-8 math questions collected from 500+ hours' worth of online one-on-one tutoring classes. In our experiments, we assess various state-of-the-art models on the MCD dataset, highlighting the importance of understanding multi-turn dialogues and handling noisy ASR transcriptions. Our findings demonstrate the dataset's utility in advancing research on automated student performance assessment. To encourage reproducible research, we make our data publicly available at https://github.com/ai4ed/MCD.
News articles constitute a valuable resource for opinion mining, as they contain important perspectives related to the subject matter they cover. In this paper, we explore how aspect-based sentiment analysis might help in understanding the public discourse surrounding agricultural biotechnologies in Africa. We introduce BioMAISx, the first English language dataset composed of direct quotes pertaining to agricultural biotechnologies extracted from a curated list of Africa-based news sources. We have identified and labelled entities related to key aspects of agricultural biotechnologies, providing valuable insights into public discourse. This dataset can aid in identifying challenges, improving public discourse, and monitoring the perception of agricultural biotechnologies, thus contributing to informed decision-making.
This paper focuses on the generation of spatiotemporal data from real-world observations to represent the evolution of phenomena of interest as moving regions. The case study is the creation of a dataset to represent the spread of a controlled forest fire from aerial images captured using a drone. We present an overview of the data acquisition and preparation steps and describe the optimization strategy implemented to establish a vertex correspondence between the regions that delimit the burned region at discrete time instants. The resulting dataset is used to create a continuous representation of the evolution of the burned region over time.
Privacy is critical when dealing with user-generated text, as common in Natural Language Processing (NLP) and Information Retrieval (IR) tasks. Documents, queries, posts, and reviews might pose a risk of inadvertently disclosing sensitive information. Such exposure of private data is a significant threat to user privacy, as it may reveal information that users prefer to keep confidential. The leading framework to protect user privacy when handling textual information is represented by the ε-Differential Privacy (DP). However, the research community lacks a unified framework for comparing different DP mechanisms. This study introduces pyPANTERA, an open-source Python package developed for text obfuscation. The package is designed to incorporate State-of-the-Art DP mechanisms within a unified framework for obfuscating data. pyPANTERA is not only designed as a modular and extensible library for enriching DP techniques, thereby enabling the integration of new DP mechanisms in future research, but also to allow reproducible comparison of the current State-of-the-Art mechanisms. Through extensive evaluation, we demonstrate the effectiveness of pyPANTERA, making it an essential resource for privacy researchers and practitioners. The source code of the library and for the experiments is available at: https://github.com/Kekkodf/pypantera **REMOVE 2nd URL**://github.com/Kekkodf/pypantera.
With the growing success of Large Language models (LLMs) in information-seeking scenarios, search engines are now adopting generative approaches to provide answers along with in-line citations as attribution. While existing work focuses mainly on attributed question answering, in this paper, we target information-seeking scenarios which are often more challenging due to the open-ended nature of the queries and the size of the label space in terms of the diversity of candidate-attributed answers per query. We propose a reproducible framework to evaluate and benchmark attributed information seeking, using any backbone LLM, and different architectural designs: (1) Generate (2) Retrieve then Generate, and (3) Generate then Retrieve. Experiments using HAGRID, an attributed information-seeking dataset, show the impact of different scenarios on both the correctness and attributability of answers.
Multi-modal knowledge graphs (MMKGs), which ground various non-symbolic data (e.g., images and videos) into symbols, have attracted attention as resources enabling knowledge processing and machine learning across modalities. However, the construction of MMKGs for videos consisting of multiple events, such as daily activities, is still in the early stages. In this paper, we construct an MMKG based on synchronized multi-view simulated videos of daily activities. Besides representing the content of daily life videos as event-centric knowledge, our MMKG also includes frame-by-frame fine-grained changes, such as bounding boxes within video frames. In addition, we provide support tools for querying our MMKG. As an application example, we demonstrate that our MMKG facilitates benchmarking vision-language models by providing the necessary vision-language datasets for a tailored task.
Multiple versions of the same dataset can exist in a data repository (e.g., data warehouses, data lakes, etc.), mainly because of the interactive and collaborative nature of data science. Data creators generally update existing datasets and upload them as new datasets to data repositories without proper documentation. Identifying such versions helps in data management, data governance, and making better decisions using data. However, there is a dearth of benchmarks to develop and evaluate data versioning techniques, which requires a lot of human effort. Thus, this work introduces a novel framework to generate benchmarks for data versioning using Generative AI (specifically Large Language Models). The proposed framework offers properties that existing benchmarks do not have, including proper documentation, version lineage, and complex transformations generated by an LLM. We also share VerLLM-v1, the first version of the benchmark that features these properties, and compare it to existing benchmarks.
For the sentence-level sentiment classification task, learning Contrastive Discourse Relations (CDRs) like a-but-b is difficult for Deep Neural Networks (DNNs) via purely data-driven training. Several methods exist in the literature for dissemination of CDR information with DNNs, but there is no dedicated dataset available to effectively test their dissemination performance. In this paper, we propose a new large-scale dataset for this purpose called Covid19-twitter, which contains around 100k tweets symmetrically divided into various categories. Instead of manual annotation, we used a combination of an Emoji analysis and a lexicon-based tool called Valence Aware Dictionary and sEntiment Reasoner (VADER) to perform automatic labelling of the tweets, while also ensuring high accuracy of the annotation process through some quality checks. We also provide benchmark performances of several baselines on our dataset for both the sentiment classification and CDR dissemination tasks. We believe that this dataset will be valuable for discourse analysis research in sentiment classification.
Understanding how urban parks are utilized and perceived by the public is crucial for effective urban planning and management. This study introduces a novel dataset derived from Instagram, using 42,187 images tagged with #Seoul and #Park hashtags from 2017 to 2023. These images were filtered using InternLM-XComposer2, a Multimodal Large Language Model (MLLM), to confirm they depicted park scenes. GPT-4 then annotated the filtered images, resulting in 29,866 valid image annotations of physical elements, human activities, animals, and emotions. The dataset is publicly available at https://huggingface.co/datasets/RedBall/seoul-urban-park-analysis-by-llm.
This work introduces EUvsDisinfo, a multilingual dataset of disinformation articles originating from pro-Kremlin outlets, along with trustworthy articles from credible / less biased sources. It is sourced directly from the debunk articles written by experts leading the EUvsDisinfo project. Our dataset is the largest to-date resource in terms of the overall number of articles and distinct languages. It also provides the largest topical and temporal coverage. Using this dataset, we investigate the dissemination of pro-Kremlin disinformation across different languages, uncovering language-specific patterns targeting certain disinformation topics. We further analyse the evolution of topic distribution over an eight-year period, noting a significant surge in disinformation content before the full-scale invasion of Ukraine in 2022. Lastly, we demonstrate the dataset's applicability in training models to effectively distinguish between disinformation and trustworthy content in multilingual settings.
Legal question answering based on case documents is a pivotal legal AI application and helps extract key elements from the legal case documents to promote downstream tasks. Intuitively, the form of this task is similar to legal machine reading comprehension. However, in existing legal machine reading comprehension datasets, the background information is much shorter than the legal case documents, and the questions are not designed from the perspective of legal knowledge. In this paper, we present LeDQA, the first Chinese legal case document-based question answering dataset to our best knowledge. Specifically, we build a comprehensive question schema (including 48 element-based questions) for the Chinese civil law by legal professionals. And considering the cost of human annotations are too expensive, we use one of the SOTA LLMs (i.e., GPT-4) to annotate the relevant sentences to these questions in each case document. The constructed dataset originates from Chinese civil cases and contains 100 case documents, 4,800 case-question pairs and 132,048 sentence-level relevance annotations. We implement several text matching algorithms for relevant sentence selection and various Large Language Models(LLMs) for legal question answering on LeDQA. The experimental results indicate that incorporating relevant sentences can benefit the performance of question answering models, but further efforts are still required to address the remaining challenges such as retrieving irrelevant sentences and incorrect reasoning between retrieved sentences.
Multimodal social network user sentiment analysis aims to determine users' emotional polarity (positive or negative) by mining the associations between multiple data types such as images and texts. Existing public datasets are mainly constructed from English social media platforms, while Chinese social media datasets for multimodal user sentiment analysis are extremely scarce. In terms of the posts published by Chinese social media users, it is not rare that the emotional polarity delivered by the image and the textual content is inconsistent. Given such emotional inconsistency between images and texts, how to effectively identify users' true emotion polarity is still challenging. Toward the above issues, in this paper, we firstly construct a Chinese social media dataset CH-Mits for multimodal user sentiment analysis. In order to evaluate the usability of the dataset, we conceive and implement a novel model called PEMNet, and compare it with state-of-the-art models based on the CH-Mits dataset. In the end, we analyze the performance of PEMNet on selected samples with emotional inconsistency between images and texts. The constructed dataset and codes for PEMNet are available at https://github.com/Marblrdumdore/CH-Mits.
Due to its collaborative nature, Wikidata is known to have a complex taxonomy, with recurrent issues like the ambiguity between instances and classes, the inaccuracy of some taxonomic paths, the presence of cycles, and the high level of redundancy across classes. Manual efforts to clean up this taxonomy are time-consuming and prone to errors or subjective decisions. We present WiKC, a new version of Wikidata taxonomy cleaned automatically using a combination of Large Language Models (LLMs) and graph mining techniques. Operations on the taxonomy, such as cutting links or merging classes, are performed with the help of zero-shot prompting on an open-source LLM. The quality of the refined taxonomy is evaluated from both intrinsic and extrinsic perspectives, on a task of entity typing for the latter, showing the practical interest of WiKC.
We present AnnoRank, a web-based user interface (UI) framework designed to facilitate collecting crowdsource annotations in the context of information retrieval. AnnoRank enables the collection of explicit and implicit annotations for a specified query and a single or multiple documents, allowing for the observation of user-selected items and the assignment of relevance judgments. Furthermore, AnnoRank allows for ranking comparisons, allowing for the visualization and evaluation of a ranked list generated by different fairness interventions, along with its utility and fairness metrics. Fairness interventions in the annotation pipeline are necessary to prevent the propagation of bias when a user selects the top-k items in a ranked list. With the widespread use of ranking systems, the application supports multimodality through text and image document formats. We also support the assessment of agreement between annotators to ensure the quality of the annotations. AnnoRank is integrated with the Ranklib library, offering a vast range of ranking models that can be applied to the data and displayed in the UI. AnnoRank is designed to be flexible, configurable, and easy to deploy to meet diverse annotation needs in information retrieval. AnnoRank is publicly available as open-source software, together with detailed documentation at https://github.com/ClaraRus/AnnoRank.
Recent advancements in Chain-of-Thoughts (CoT) and Program-of-Thoughts (PoT) methods have greatly enhanced language models' mathematical reasoning capabilities, facilitating their integration into instruction tuning datasets with LLMs. However, existing methods for large-scale dataset creation require substantial seed data and high computational costs for data synthesis, posing significant challenges for scalability. We introduce InfinityMATH, a scalable instruction tuning dataset for programmatic mathematical reasoning. The construction pipeline emphasizes decoupling numbers from mathematical problems to synthesize number-independent programs, enabling efficient and flexible scaling while minimizing dependency on specific numerical values. Fine-tuning experiments with open-source language and code models, such as Llama2 and CodeLlama, demonstrate the practical benefits of InfinityMATH. These fine-tuned models, showed significant relative improvements on both in-domain and out-of-domain benchmarks, ranging from 184.7% to 514.3% on average. Additionally, these models exhibited high robustness on the GSM8K+ and MATH+ benchmarks, which are enhanced version of test sets with simply the number variations. InfinityMATH ensures that models are more versatile and effective across a broader range of mathematical problems. The data is available at https://huggingface.co/datasets/flagopen/InfinityMATH.
Time series anomaly detection is of significant importance in many real-world applications, including finance, healthcare, network security, industrial equipment, complex computing systems, and space probes. Most of these applications involve multi-sensor systems, thus how to perform multivariate time series anomaly detection (MTSAD) has garnered widespread attention. This broad attention has fueled extensive research endeavors aimed to innovate and develop methods and techniques to improve the efficiency and precision of anomaly detection on multivariate time series data, including both classic machine learning methods and deep learning methods. However, evaluating the performance of these methods remains challenging due to the limited availability of public benchmark datasets for MTSAD, which are often criticized for various reasons. Additionally, there is no consensus on the best metrics for time series anomaly detection, further complicating MTSAD research. In this paper, we advance the benchmarking of time series anomaly detection by addressing datasets, evaluation metrics, and algorithm comparison. To the best of our knowledge, we have generated the largest real-world datasets for MTSAD using the Hologres AIOps system in the Alibaba Cloud platform. We review and compare popular evaluation metrics including recently proposed ones. To evaluate classic machine learning and recent deep learning methods fairly, we have conducted extensive comparisons of these methods on various datasets. We believe that our benchmarks and datasets will promote reproducible results and accelerate the progress of MTSAD research.
Multi-modal Entity Alignment (MMEA) aims to identify equivalent entities across different multi-modal knowledge graphs (MMKGs), facilitating their integration and enhancing coverage. However, current MMEA datasets have limitations, including low entity coverage, a single image per entity, high inter-image correlation, and images sourced from the same search engine, which do not reflect real-world challenges. The fair comparison and development of alignment solutions may be hindered by these oversimplified scenarios. To address this problem, in this work, we first construct M3, an MMEA benchmark equipped with multiple images from different search engines in real-world scenarios. Additionally, we design a simple and universal multi-image processing module (AMIA), which assigns varying attention weights to images associated with entities to effectively model visual information. Experimental results validate the difficulty of M3, as well as the effectiveness of AMIA. Despite the superior performance of AMIA, there is still room for developing more advanced solutions to address these difficulties. Our dataset is publicly released.
Crafting effective features is a crucial yet labor-intensive and domain-specific task within machine learning pipelines. Fortunately, recent advancements in Large Language Models (LLMs) have shown promise in automating various data science tasks, including feature engineering. But despite this potential, evaluations thus far are primarily based on the end performance of a complete ML pipeline, providing limited insight into precisely how LLMs behave relative to human experts in feature engineering. To address this gap, we propose ELF-Gym, a framework for Evaluating LLM-generated Features. We curated a new dataset from historical Kaggle competitions, including 251 golden features used by top-performing teams. ELF-Gym then quantitatively evaluates LLM-generated features by measuring their impact on downstream model performance as well as their alignment with expert-crafted features through semantic and functional similarity assessments. This approach provides a more comprehensive evaluation of disparities between LLMs and human experts, while offering valuable insights into specific areas where LLMs may have room for improvement. For example, using ELF-Gym we empirically demonstrate that, in the best-case scenario, LLMs can semantically capture approximately 56% of the golden features, but at the more demanding implementation level this overlap drops to 13%. Moreover, in other cases LLMs may fail completely, particularly on datasets that require complex features, indicating broad potential pathways for improvement.
The prevalence of check fraud, particularly with stolen checks sold on platforms such as Telegram, creates significant challenges for both individuals and financial institutions. This underscores the urgent need for innovative solutions to detecting and preventing such fraud on social media platforms. While deep learning techniques show great promise in detecting objects and extracting information from images, their effectiveness in addressing check fraud is hindered by the lack of comprehensive, open-source, large training datasets specifically for check information extraction. To bridge this gap, this paper introduces "CheckGuard," a large labeled image-to-text cross-modal dataset designed for check information extraction. CheckGuard comprises over 7,000 real-world stolen check image segments from more than 15 financial institutions, featuring a variety of check styles and layouts. These segments have been manually labeled, resulting in over 50,000 samples across seven key elements: Drawer, Payee, Amount, Date, Drawee, Routing Number, and Check Number. This dataset supports various tasks such as visual question answering (VQA) on checks and check image captioning. Our paper details the rigorous data collecting, cleaning, and annotation processes that make CheckGuard a valuable resource for researchers in check fraud detection, machine learning, and multimodal large language models (MLLMs). We not only benchmark state-of-the-art (SOTA) methods on this dataset to assess their performance but also explore potential enhancements. Our application of parameter-efficient fine-tuning (PEFT) techniques on the SOTA MLLMs demonstrates significant performance improvements, providing valuable insights and practical approaches for enhancing model efficacy on this task. As an evolving project, CheckGuard will continue to be updated with new data, enhancing its utility and driving further advancements in the field. Our PEFT-based MLLM code is available at: https://github.com/feizhao19/CheckGuard. For data access, researchers are required to contact the authors directly.
Climate change has led to a sharp increase in the number and severity of extreme events, such as floods, tornados and wildfires. These events have resulted in adverse effects on human lives and the infrastructure. Swift disaster assessment is crucial for the effective planning of disaster response and relief efforts. AI and big data have provided unprecedented opportunities to enable swift disaster assessment, but two significant hurdles exist: (1) the scarcity of annotated geospatial data to train AI models, and (2) the lack of AI solutions that encode physics knowledge in a geospatial context. My research aims to address both challenges by developing an active-learning-based annotation platform that improves the annotation productivity of geospatial data for geospatial machine learning, and by developing physics-guided machine learning models for accurate natural disaster assessment.
Theory of Mind (ToM) reasoning involves understanding that others have unique mental states-like beliefs, thoughts, intentions, viewpoints, and emotions-different from one's own, and incorporating this into one's reasoning. While some research suggests that LLMs possess reasoning abilities, other studies challenge this assertion, often focusing on structured responses and overlooking the complexities of open-ended interactions. As LLMs are increasingly employed in different sectors, their ability to accurately interpret human mental states in reasoning becomes critical. For example, in psychological services, if LLMs generate reasoning responses without understanding human mental states, their answers may lack logical soundness and potentially exacerbate client distress. Therefore, understanding LLMs' ToM capabilities is crucial to ensure they deliver effective and appropriate responses in real-world scenarios. In this research, I investigate the effectiveness of incorporating questioners' viewpoints in the questions-whether posed in a rational or intuitive manner-on the generation of reasoning answers by LLMs and how these generated answers align with human-written responses. The results demonstrate that incorporating these viewpoints into the prompt instructions enhances the reasoning performance of LLMs, although the responses still fall short of being truly human-like. This research contributes to the information retrieval and generative AI community by raising awareness about the limitations of LLMs in reasoning and their alignment with human responses in this domain.
Designing and analyzing network science algorithms, such as node classification, link prediction, and pattern identification (communities, triangles, dense sub-graphs, cliques), require diverse real-world datasets for performance evaluation. However, these datasets are often limited and small due to privacy concerns and platform access policies. This scarcity is even more pronounced for signed networks, as negative relationship data is rarely shared publicly. This PhD thesis aims to address this problem by generating realistic synthetic signed networks using the SNSRM and SISSRM models. Preserving the mesoscopic spectral and structural characteristics of the input signed network is crucial in this process. Additionally, this thesis tackles the challenge of efficiently analyzing elementary network property of triad enumeration by developing an triangle counting algorithm capable of enumerating balanced and unbalanced triads.
Knowledge Graphs (KGs) have been used to organize large datasets into structured, interconnected information, enhancing data analytics across various fields. In the legislative context, one potential natural application of KGs is modeling the intricate set of interconnections that link laws and their articles with each other and the broader legislative context.
At the same time, the rise of large language models (LLMs) such as GPT has opened new opportunities in legal applications, such as text generation and document drafting. Despite their potential, the use of LLMs in legislative contexts is critical since it requires the absence of hallucinations and reliance on up-to-date information, as new laws are published on a daily basis.
This work investigates how Legislative Knowledge Graphs and LLMs can synergize and support legislative processes. We address three key questions: the benefits of using KGs for legislative systems, how LLM can support legislative activities by ensuring an accurate output, and how we can allow non-technical users to use such technologies in their activities. To this aim, we develop Legis AI Platform, an interactive platform focused on Italian legislation that enhances the possibility of conducting legislative analysis and that aims to support lawmaking activities.
Detecting false information on social media is critical in mitigating its negative societal impacts. To reduce the propagation of false information, automated detection provide scalable, unbiased, and cost-effective methods. However, there are three potential research areas identified which once solved improve detection. First, current AI-based solutions often provide a uni-dimensional analysis on a complex, multi-dimensional issue, with solutions differing based on the features used. Furthermore, these methods do not account for the temporal and dynamic changes observed within the document's life cycle. Second, there has been little research on the detection of coordinated information campaigns and in understanding the intent of the actors and the campaign. Thirdly, there is a lack of consideration of cross-platform analysis, with existing datasets focusing on a single platform, such as X, and detection models designed for specific platform.
This work aims to develop methods for effective detection of false information and its propagation. To this end, firstly we aim to propose the creation of an ensemble multi-faceted framework that leverages multiple aspects of false information. Secondly, we propose a method to identify actors and their intent when working in coordination to manipulate a narrative. Thirdly, we aim to analyse the impact of cross-platform interactions on the propagation of false information via the creation of a new dataset.
Human beings aspire for a better life. Financial well-being enables this. However, lack of financial literacy, ever-growing wealth inequality, and persuading illicit information floating in social media inhibit one's progress towards a good fortune. In this paper, we discuss four pillars where Natural Language Processing can help improve financial literacy, reduce wealth disparity, ensure a sustainable future, and economic prosperity. These pillars are: Inclusive investing, Improved investing, Impactful (green) investing, and Informed investing. Additionally, we focus to specifically cater to the Indian market (Indic investing) and present several resources to enhance comprehensibility of financial texts. Inclusive investing deals with enhancing the readability and reachability of financial texts. Improved investing addresses the need to simplify investors' journey by providing them with hypernyms and relations between entities. Impactful investing is associated with focusing on sustainable pathways. Improved investing is about eradicating finance related misinformation from social media, like evaluating trustworthiness of posts by executives, detecting in-claim and exaggerated numerals, etc. In most cases, we are able to demonstrate the efficacies of our approaches by benchmarking them with existing state-of-the-art methods.
This study examines the intersection between social media and mainstream television (TV) news with an aim to understand how social media content amplifies its impact through TV broadcasts. While many studies emphasize social media as a primary platform for information dissemination, they often underestimate its total influence by focusing solely on interactions within the platform. This research examines instances where social media posts gain prominence on TV broadcasts, reaching new audiences and prompting public discourse. By using TV news closed captions, on-screen text recognition, and social media logo detection, we analyze how social media is referenced in TV news. Our methodology aims to analyze this data to develop metrics that quantify the extent of this amplification and understand the contexts in which social media is integrated into broadcast content.
In the realm of personalized recommendation systems, accurately capturing user preferences and item characteristics is important for delivering relevant and satisfying recommendations. This study introduces innovative approaches using Large Language Models (LLMs) to generate detailed textual descriptions that enhance both user and item representations. We propose a dual strategy: for user representation, we employ supervised fine-tuning coupled with Retrieval-Augmented Generation (RAG) to keep the model current with dynamic user preferences; for item representation, we leverage the extensive knowledge base of LLMs to enrich item descriptions and infer traits from user interactions. These methods promise a deeper, more nuanced understanding of both users and items, potentially leading to superior recommendation accuracy. We adopt a rigorous evaluation methodology, ensuring the reliability of our results and the effectiveness of our proposed system. This paper discusses these methodologies, presents our preliminary findings, and highlights the potential of text-augmented profiles in advancing recommendation systems.
Recently, Knowledge Graphs (KGs) have been successfully coupled with Large Language Models (LLMs) to mitigate their hallucinations and enhance their reasoning capability, e.g., KG-based retrieval-augmented framework for question-answering. However, current KG-LLM frameworks lack rigorous uncertainty estimation, limiting their reliable deployment in high-stake applications where the cost of errors is significant. To address this crucial gap, we propose a new trustworthy KG-LLM framework, UaG(<u>U</u>ncertainty <u>A</u>ware <u>G</u>raph Reasoning), which incorporates uncertainty quantification into the KG-LLM framework. We design an uncertainty-aware multi-step reasoning framework that leverages conformal prediction to provide a theoretical guarantee on the prediction set. To manage the error rate of the multi-step process, we additionally introduce an error rate control module to adjust the error rate within the individual components. Our preliminary results demonstrate that UaG can achieve the desired theoretical coverage while maintaining a reasonable prediction set size.
Tabular data synthesis has become crucial for financial applications including fraud detection, especially where there are data privacy regulations such as General Data Protection Regulation (GDPR) restrict access to original data. Despite its importance, current generative models inadequately address key challenges in financial fraud detection (FFD) data, namely extreme class imbalance, high data sparsity, and non-normal attribute distributions. My research introduces novel graph-theoretical generative models, SeparateGGM and SignedGGM, designed to tackle these challenges. By integrating graph neural network-based feature engineering, graph topology and connectivity analysis, and novel graph centrality indicators, my models achieve optimal graph settings for enhanced fraud detection accuracy. This approach is pioneering in its application of diverse graph-theoretical methods to improve FFD performance. Preliminary results demonstrate my models' superiority over competing methods on multiple FFD benchmark datasets. The goal of this research is to significantly advance real-world financial fraud detection techniques and to show that several graph-theoretical methodologies can significantly contribute to the generation of high-quality tabular synthetic data for enhancing fraud detection accuracy to the data science community.
Recommender systems play an essential role in determining the content users encounter on social media platforms and in uncovering relevant news. However, they also present significant risks, such as reinforcing biases, over-personalizing content, fostering filter bubbles, and inadvertently promoting misinformation. The spread of false information is rampant across various online platforms, such as Twitter (now X), Meta, and TikTok, especially noticeable during events like the COVID-19 pandemic and the US Presidential elections. These instances underscore the critical necessity for transparency and regulatory oversight in the development of recommender systems. Given the challenge of balancing free speech with the risks of outright removal of fake news, this paper aims to address the spread of misinformation from algorithmic biases in recommender systems using a social science perspective.
Proteins serve as the workhorses of living organisms, orchestrating a wide array of vital functions. Post-translational modifications (PTMs) of their amino acids greatly influence the structural and functional diversity of different protein types. However, current protein language models (pLMs) like ESM-2 and ProtT5 do not account for PTMs. We introduce PTM-Mamba, a PTM-aware pLM that integrates PTM information into protein sequence modeling. PTM-Mamba leverages structured state space models (SSMs) and incorporates PTM tokens into its training regime. Our results show that PTM-Mamba improves performance on PTM-specific tasks and is the first pLM capable of representing both wild-type and PTM sequences.
Conventional machine learning systems operate on the assumption of independent and identical distribution (i.i.d), where both the training and test data share a similar sample space, and no distribution shift exists between them. However, this assumption does not hold in practical deployment scenarios, making it crucial to develop methodologies that address the non-trivial task of data distribution shift. In our research, we aim to address this problem by developing ML algorithms that explicitly achieve promising performance when subjected to various types of out-of-distribution (OOD) data. Specifically, we approach the problem by categorizing the data distribution shifts into two types: covariate shifts and semantic shifts, and proposing effective methodologies to tackle each type independently and conjointly while validating them with different types of datasets. We aim to propose ideas that are compatible with existing deep neural networks to perform detection and/or generalization of the test instances that are shifted in semantic and covariate space, respectively.
With the rapid development of location based services, multimodal spatio-temporal (ST) data including trajectories, transportation modes, traffic flow and social check-ins are being collected for deep learning based methods. These deep learning based methods learn ST correlations to support the downstream tasks in the fields such as smart mobility, smart city and other intelligent transportation systems. Despite their effectiveness, ST data fusion and forecasting methods face practical challenges in real-world scenarios. First, forecasting performance for ST data-insufficient area is inferior, making it necessary to transfer meta knowledge from heterogeneous area to enhance the sparse representations. Second, it is nontrivial to accurately forecast in multi-transportation-mode scenarios due to the fine-grained ST features of similar transportation modes, making it necessary to distinguish and measure the ST correlations to alleviate the influence caused by entangled ST features. At last, partial data modalities (e.g., transportation mode) are lost due to privacy or technical issues in certain scenarios, making it necessary to effectively fuse the multimodal sparse ST features and enrich the ST representations. To tackle these challenges, our research work aim to develop effective fusion and forecasting methods for multimodal ST data in smart mobility scenario. In this paper, we will introduce our recent works that investigates the challenges in terms of various real-world applications and establish the open challenges in this field for future work.
Automated fact-checking has emerged as a safeguard against the spread of false information. Existing fact-checking approaches aim to determine whether a news claim is true or false, and they have achieved decent accuracy of veracity prediction. However, the current state-of-the-art models still face challenges, such as ambiguity in the claims and lack of contextual information. This study introduces a fact-checking model, Path-FC, which focuses on 1) augmenting the representations of claims and evidence by incorporating additional context using the Knowledge Paths extracted from the external Knowledge Graph; 2) Identifying false claims by learning the differences between claims and evidence. The experimental results demonstrate that Knowledge Path retrieval, combined with the multi-head attention technique, contributes to improved performance of fact-checking. The code is available at https://anonymous.4open.science/r/Path-FC.
In today's digital landscape, Deep Recommender Systems (DRS) play a crucial role in navigating and customizing online content for individual preferences. However, conventional methods, which mainly depend on single recommendation task, scenario, data modality and user behavior, are increasingly seen as insufficient due to their inability to accurately reflect users' complex and changing preferences. This gap underscores the need for multi-granularity modeling, which are central to overcoming these limitations by integrating diverse tasks, scenarios, modalities, and behaviors in the recommendation process, thus promising significant enhancements in recommendation precision, efficiency, and customization. In this paper, from the multi-scenario perspective, we illustrate our existing explorations and present results. Ultimately, we wish to highlight our multi-granularity approach sheds light on building the next generation of recommender system1 .
Integrating Large Language Models (LLMs) with external tools and APIs is essential for fields such as information retrieval and knowledge management. While LLMs have made significant strides, their effective integration with external APIs-essential for real-world applications-remains challenging. This paper introduces RESTful-Llama, a novel method designed to empower open-source LLMs to accurately convert natural language instructions into well-formed RESTful API calls. Moreover, RESTful-Llama utilizes DOC-Prompt, a newly proposed technique for generating fine-tuning datasets from publicly available API documentation. Initial experiments demonstrate that RESTful-Llama significantly enhances the accuracy of generated REST API requests.
This paper explores three key aspects of causal discovery from heterogeneous time series. First, it introduces a method that uses (conditional) mutual information to determine (conditional) independence among diverse qualitative and quantitative variables, alongside a novel local permutation test. Next, the paper presents a new algorithm for identifying event-based causal relations in threshold-based IT systems for root cause analysis. This method is effective when root causes are not causally related, and an extension involving agent intervention is proposed to address this limitation. Both the algorithm and its extension utilize causal discovery from offline data and subgraph traversal for new anomalies in online data. Finally, the paper addresses time series with multiple piecewise consistent regimes, each having distinct causal mechanisms. The proposed method segments the time series into appropriate regimes and identifies the correct window causal graph, capturing both instantaneous and lagged connections within each regime. Experiments with synthetic data confirm the effectiveness of the proposed methods.
Submodular function optimization is a fundamental tool in modeling complex interactions in machine learning and graph mining problems. We propose to study constrained submodular optimization to improve the current state of the art. Our goals are to design evolutionary algorithms with stronger approximation guarantees, the study of submodular maximization under submodular constraints, fairness in submodular optimization, and k-submodular optimization. We begin by exploring a basic submodular maximization problem in the context of viral marketing/information diffusion in social networks. From there, we broaden our scope to include a broader range of scenarios and formulate them into more optimization problems. Looking ahead, we plan to tackle scalable submodular optimization problems in fairness, dynamic constraints, dynamic streams, and distributed fashion. Lastly, we discuss a series of real-world applications that can be formulated as submodular optimization problems. In the future, we aim to apply algorithmic ideas to solve more real-world problems.
This tutorial offers a hands-on introduction into the captivating field of quantum machine learning (QML). Beginning with the bedrock of quantum information science (QIS)-including essential elements like qubits, single and multiple qubit gates, measurements, and entanglement-the session swiftly progresses to foundational QML concepts. Participants will explore parameterized or variational circuits, data encoding or embedding techniques, and quantum circuit design principles. Delving deeper, attendees will examine various QML models, including the quantum support vector machine (QSVM), quantum feed-forward neural network (QNN), and quantum convolutional neural network (QCNN). Pushing boundaries, the tutorial delves into cutting-edge QML models such as quantum recurrent neural networks (QRNN) and quantum reinforcement learning (QRL), alongside privacy-preserving techniques like quantum federated machine learning, bolstered by concrete programming examples. Throughout the tutorial, all topics and concepts are brought to life through practical demonstrations executed on a quantum computer simulator. Designed with novices in mind, the content caters to those eager to embark on their journey into QML. Attendees will also receive guidance on further reading materials, as well as software packages and frameworks to explore beyond the session.
In recent years, Graph Neural Networks (GNNs) have attracted considerable attention. However, the rapid emergence of diverse GNN models, each grounded in different theoretical foundations, complicates the model selection process, as these models are not easily understood within a unified framework. Initial GNNs were constructed using spectral theory, while others were developed based on spatial theory. This theoretical divergence makes direct comparisons difficult. Furthermore, the variety of models within each theoretical domain further complicates their evaluation. In this tutorial, we explore state-of-the-art GNNs and present a comprehensive framework that bridges the spatial and spectral domains, clarifying their interrelationship. This framework deepens our understanding of GNN operations. The tutorial delves into key paradigms, such as spatial and spectral methods, through a synthesis of spectral graph theory and approximation theory. We conduct an in-depth analysis of recent research advancements, addressing emerging issues like over-smoothing, using well-established GNN models to illustrate the universality of our framework.
Large Language Models (LLMs) have demonstrated remarkable success across various domains but often lack fairness considerations, potentially leading to discriminatory outcomes against marginalized populations. Unlike fairness in traditional machine learning, fairness in LLMs involves unique backgrounds, taxonomies, and fulfillment techniques. This tutorial provides a systematic overview of recent advances in the literature concerning fair LLMs, beginning with real-world case studies to introduce LLMs, followed by an analysis of bias causes therein. The concept of fairness in LLMs is then explored, summarizing the strategies for evaluating bias and the algorithms designed to promote fairness. Additionally, resources for assessing bias in LLMs, including toolkits and datasets, are compiled, and current research challenges and open questions in the field are discussed. The repository is available at https://github.com/LavinWong/Fairness-in-Large-Language-Models.
The proliferation of large language models (LLMs) has catalyzed a diverse array of applications. This tutorial delves into the application of LLMs for tabular data and targets a variety of table-related tasks, such as table understanding, text-to-SQL conversion, and tabular data preprocessing. It surveys LLM solutions to these tasks in five classes, categorized by their underpinning techniques: prompting, fine-tuning, RAG, agents, and multimodal methods. It discusses how LLMs offer innovative ways to interpret, augment, query, and cleanse tabular data, featuring academic contributions and their practical use in the industrial sector. It emphasizes the versatility and effectiveness of LLMs in handling complex table tasks, showcasing their ability to improve data quality, enhance analytical capabilities, and facilitate more intuitive data interactions. By surveying different approaches, this tutorial highlights the strengths of LLMs in enriching table tasks with more accuracy and usability, setting a foundation for future research and application in data science and AI-driven analytics. Presentation slides for this tutorial will be available at: https://dongyuyang.github.io/tableLLM-tutorial/ .
Tabular data are the most widely used data formats in almost every application domain, such as, biology, ecology, and material science. The purpose of tabular data-centric AI is to use AI to augment the predictive power of tabular data to get better AI. Tabular data-centric AI is essential because it can reconstruct distance measures, reshape discriminative patterns, and improve data AI readiness (structural, predictive, interaction, and expression levels), which is significant in industries and real-world deployments. Therefore, our tutorial is designed to capture the interest of professionals with expertise in artificial intelligence, machine learning, and data mining, as well as researchers engaged in specific application areas and interdisciplinary studies. Examples of such applications include quality control, predictive maintenance, supply chain optimization, process efficiency improvements, biomarker identification, material performance screening. In this tutorial, we will explore the emerging field of Tabular Data-Centric AI. Our discussion will provide a comprehensive overview of this domain: (1) We will demonstrate the different settings within this research domain based on distinct application scenarios. (2) We will identify and explain the significant challenges encountered in tabular data-centric AI. (3) We will highlight existing methods and benchmarks. (4) We will discuss future potential directions for this domain and examine its interconnections with other research areas. To enhance the learning experience, this tutorial will include a hands-on section designed to teach participants the fundamental aspects of developing, evaluating and visualizing techniques in tabular data-centric AI. After this tutorial, attendees will have a deep understanding of tabular data-centric AI research, including its key challenges, seminal techniques, and insights into integrating tabular data-centric AI into their own research.
"The previous era was about information at your fingertips; I think of the AI era as expertise at your fingertips." - Satya Nadella, CNBC
This tutorial explores Large Language Model (LLM)-based autonomous agents, addressing the lack of comprehensive guides on the topic. It systematically examines key components such as profiling, perception, memory, planning, and action, using an established taxonomy. The tutorial also extends the discussion to multi-agent frameworks, offering insights into collaborative intelligence. Additionally, it compares popular open-source frameworks for LLM-based agent development and discusses evaluation methodologies, focusing on efficiency and safety. The tutorial aims to catalyze dialogue and partnership among practitioners, propelling forward the integration of robust and effective LLM agent systems into the production environment.
Temporal graphs capture dynamic node relations via temporal edges, finding extensive utility in wide domains where time-varying patterns are crucial. Temporal Graph Neural Networks (TGNNs) have gained significant attention for their effectiveness in representing temporal graphs. However, TGNNs still face significant efficiency challenges in real-world low-resource settings. First, from a data-efficiency standpoint, training TGNNs requires sufficient temporal edges and data labels, which is problematic in practical scenarios with limited data collection and annotation. Second, from a resource-efficiency perspective, TGNN training and inference are computationally demanding due to complex encoding operations, especially on large-scale temporal graphs. Minimizing resource consumption while preserving effectiveness is essential. Inspired by these efficiency challenges, this tutorial systematically introduces state-of-the-art data-efficient and resource-efficient TGNNs, focusing on algorithms, frameworks, and tools, and discusses promising yet under-explored research directions in efficient temporal graph learning. This tutorial aims to benefit researchers and practitioners in data mining, machine learning, and artificial intelligence.
Recent years have seen a significant shift in Artificial Intelligence from model-centric to data-centric approaches, highlighted by the success of large foundational models. Following this trend, despite numerous innovations in graph machine learning model design, graph-structured data often suffers from data quality issues, jeopardizing the progress of Data-centric AI in graph-structured applications. Our proposed tutorial addresses this gap by raising awareness about data quality issues within the graph machine-learning community. We provide an overview of existing topology, imbalance, bias, limited data, and abnormality issues in graph data. Additionally, we highlight recent developments in foundational graph models that focus on identifying, investigating, mitigating, and resolving these issues.
Over the past two years, GAI has evolved rapidly, influencing various fields including social and e-commerce Recsys. Despite exciting advances, landing these innovations in real-world Recsys remains challenging due to the sophistication of modern industrial product and systems. Our tutorial begins with a brief overview of building industrial Recsys and GAI fundamentals, followed by the ongoing efforts and opportunities to enhance personalized recommendations with foundation models.
We then explore the integration of curation capabilities into Recsys, such as repurposing raw content, incorporating external knowledge, and generating personalized insights/explanations to foster transparency and trust. Next, the tutorial illustrates how AI agents can transform Recsys through interactive reasoning and action loops, shifting away from traditional passive feedback models. Finally, we shed insights on real-world solutions for human-AI alignment and responsible GAI practices.
A critical component of the tutorial is detailing the AI, Infrastructure, LLMOps, and Product roadmap (including the evaluation and responsible AI practices) derived from the production solutions in LinkedIn, Amazon, TikTok, and Microsoft. While GAI in Recsys is still in its early stages, this tutorial provides valuable insights and practical solutions for the Recsys and GAI communities.
In the pursuit of justice and accountability in the digital age, the integration of Large Language Models (LLMs) with digital forensics holds immense promise. This half-day tutorial provides a comprehensive exploration of the transformative potential of LLMs in automating digital investigations and uncovering hidden insights. Through a combination of real-world case studies, interactive exercises, and hands-on labs, participants will gain a deep understanding of how to harness LLMs for evidence analysis, entity identification, and knowledge graph reconstruction. By fostering a collaborative learning environment, this tutorial aims to empower professionals, researchers, and students with the skills and knowledge needed to drive innovation in digital forensics. As LLMs continue to revolutionize the field, this tutorial will have far-reaching implications for enhancing justice outcomes, promoting accountability, and shaping the future of digital investigations.
Graph-theoretic algorithms and graph machine learning models are essential tools for addressing many real-life problems, such as social network analysis and bioinformatics. To support large-scale graph analytics, graph-parallel systems have been actively developed for over one decade, such as Google's Pregel and Spark's GraphX, which (i) promote a think-like-a-vertex computing model and target (ii) iterative algorithms and (iii) those problems that output a value for each vertex. However, this model is too restricted for supporting the rich set of heterogeneous operations for graph analytics and machine learning that many real applications demand.
In recent years, two new trends emerge in graph-parallel systems research: (1) a novel think-like-a-task computing model that can efficiently support the various computationally expensive problems of subgraph search; and (2) scalable systems for learning graph neural networks. These systems effectively complement the diversity needs of graph-parallel tools that can flexibly work together in a comprehensive graph processing pipeline for real applications, with the capability of capturing structural features. This tutorial will provide an effective categorization of the recent systems in these two directions based on their computing models and adopted techniques, and will review the key design ideas of these systems.
Understanding online behaviors, communities, and trends through social media analytics is becoming increasingly important. Recent changes in the accessibility of platforms like Twitter have made Mastodon a valuable alternative for researchers. In this tutorial, we will explore methods for collecting and analyzing public data from Mastodon, a decentralized micro-blogging social network. Participants will learn about the architecture of Mastodon, techniques and best practices for data collection, and various analytical methods to derive insights from the collected data. This session aims to equip researchers with the skills necessary to harness the potential of Mastodon data in computational social science and social data science research.
The peer review process is a fundamental aspect of academic publishing, ensuring the quality and credibility of scholarly work. In this talk, we will explore the critical challenges associated specifically with the assignment of reviewers to submitted papers. We will introduce Reviewerly, our innovative solution designed to enhance the efficiency and effectiveness of reviewer assignments by leveraging data from diverse sources, including OpenAlex, PubMed, and DBLP. By modeling the reviewer assignment problem as an information retrieval task, we focus on retrieving a pool of relevant and diverse reviewers for each paper.
Large Language Models (LLMs) demonstrate impressive abilities across a wide range of NLP tasks. However, their underlying architecture and design come with inherent limitations, which result in issues like hallucinations and constrained reasoning capabilities. Additionally, creating an autonomous AI agent capable of handling complex real-world tasks demands access to real-time information, sensitive data, or external tools-capabilities that most LLMs currently lack. Addressing these issues may require augmenting LLMs with external knowledge through function calling. These function calls serve as an interface between LLMs and the world, enabling access to real-time data, diverse tools, reasoning systems, knowledge graphs, APIs, plugins, code interpreters, and more.
The primary objective of this talk is to highlight the significance of function-calling capabilities in bridging the knowledge gap in LLMs, showcase recent research advancements in this area, and discuss existing challenges along with future directions. Also, I will present a training and benchmarking data suite for function calling - API-BLEND and a function calling model - Granite-20B-FunctionCalling.
In the realm of e-Commerce, a fundamental problem is accurate interpretation of users' core intent. The intent is often subtly expressed implicitly or stated explicitly with the usage of verbose tokens or key phrases in a user query. In this work, we focus on the later class of problems where we identify a subset of query tokens which are the primary intent bearing phrases that convey explicit intents. We did not solve this as an intent detection problem but rather an immutable component detection problem because we believe that discovering the immutable phrases in a query entails that those are the intent bearing phrases. Furthermore, identifying a certain set of query tokens as immutable ensures better downstream processing in terms of unprecedented token handling, query category detection or query rewrites. We have developed a BERT based supervised learned model which can identify core-intent tokens, thereby improving F1 score over the baseline by over 35%. Furthermore, we integrated our proposed approach for a query recovery strategy which produces approximately 11.9% improvement in offline relevance scores compared to the production model.
Modern search systems offer multiple ways for expressing information needs, including image, voice, and text. Consequently, an increasing number of users seamlessly transition between these modalities to convey their intents. This emerging trend presents new opportunities for utilizing queries in different modalities to help users complete their search journeys efficiently. In this proposal, we introduce an approach to segmenting a multimodal query stream into missions, demonstrate how these in-mission queries can enhance search ranking, and outline key areas for future research.
In search, autocomplete (AC) is an essential tool that provides suggestions for each keystroke, functioning well with token-based queries. However, it is challenging to handle at scale when input queries are conversational and semantically rich. Identifying relevant queries for sub-tokens requires efficient lookup strategies, real-time ranking, and relevance in the results. This work integrates Retrieval-Augmented Generation (RAG), AI safety, and relevance ranking to produce autocomplete suggestions for conversational queries in a production system. RAG-based responses ensure a high hit ratio for popular AC inputs and maintain a very low risk category by not triggering any critical AI safety concerns.
This talk addresses the challenge of improving user experience on e-commerce platforms by enhancing product ranking relevant to user's search queries. Queries such as S2716DG consist of alphanumeric characters where a letter or number can signify important detail for the product/model. Speaker describes recent research where we curate samples from existing datasets at eBay, manually annotated with buyer-centric relevance scores, and centrality scores which reflect how well the product title matches the user's intent. We introduce a User-intent Centrality Optimization (UCO) approach for existing models, which optimizes for the user intent in semantic product search. To that end, we propose a dual-loss based optimization to handle hard negatives, i.e., product titles that are semantically relevant but do not reflect the user's intent. Our contributions include curating a challenging evaluation set and implementing UCO, resulting in significant improvements in product ranking efficiency, observed for different evaluation metrics. Our work aims to ensure that the most buyer-centric titles for a query are ranked higher, thereby, enhancing the user experience on e-commerce platforms.
Ads Content Safety at Google requires classifying billions of ads for Google Ads content policies. Consistent and accurate policy enforcement is important for advertiser experience and user safety and it is a challenging problem, so there is a lot of value for improving it for advertisers and users. Inconsistent policy enforcement causes increased policy friction and poor experience with good advertisers, and bad advertisers exploit the inconsistency by creating multiple similar ads in the hope that some will get through our defenses. This study proposes a method to understand advertiser's intent for content policy violations, using Large Language Models (LLMs). We focus on identifying good advertisers to reduce content over-flagging and improve advertiser experience, though the approach can easily be extended to classify bad advertisers too. We generate advertiser's content profile based on multiple signals from their ads, domains, targeting info, etc. We then use LLMs to classify the advertiser content profile, along with relying on any knowledge the LLM has of the advertiser, their products or brand, to understand whether they are likely to violate a certain policy or not. After minimal prompt tuning our method was able to reach 95% accuracy on a small test set.
Large language models (LLMs) have transformed automated code generation. However, their high computational demands often lead to server overload and increased latency in SaaS deployments. To address this, we present SpeCoder, a framework that accelerates server-side code generation using speculative sampling (SpS) and supervised fine-tuning (SFT). SpS allows lower latency in the code generation, whereas SFT enables more personalized code generation tailored to the user's needs.
Large language models (LLMs) have shown immense potential for applications in information retrieval and knowledge management, but their computational and memory demands pose challenges for resource-constrained devices. In response, this work introduces an FPGA-based accelerator designed to improve LLM inference performance on embedded devices. We leverage quantization techniques, asynchronous computation, and a fully-pipelined accelerator to enhance efficiency. Our empirical evaluations, conducted using the TinyLlama 1.1B model on a Xilinx ZCU102 platform, demonstrate a 14.3-15.8x speedup and a 6.1x energy efficiency improvement over running exclusively on the ZCU102 processing system (PS).
Compound words are a grammatical structure that allows forming new words by composing existing words. For e-commerce search in German, it is essential to split these compounds into meaningful parts because item titles often use the joint form while search queries are often split. We propose a method for German compound splitting leveraging a large language model (LLM) with a voting mechanism and a hyperparameter search for automatically optimizing prompt and parameter combinations. Our evaluation of the proposed method on human-created gold standard data for e-commerce shows that it outperforms existing methods for compound splitting in this domain.
The way research and business manage and utilize knowledge is undergoing a significant transformation, driven by Artificial Intelligence (AI). Deep learning and machine learning are emerging as powerful tools for optimizing knowledge management systems, leading to more informed and productive development. AI offers unique solutions for organizations struggling with information overload and inefficient knowledge transfer. These AI models can significantly improve data management and utilization. Imagine an AI-powered system that streamlines onboarding processes, provides precise answers to various queries, and even captures the valuable tacit knowledge (implicit skills and expertise) often residing within individuals. AI bridges the gap between explicit knowledge (easily documented information) and tacit knowledge, fostering a more comprehensive and accessible knowledge base. However, such AI systems solicit trustworthy and responsible approaches to mitigate potential misuse and malfunction. In this workshop, we aim to gather researchers and engineers from academia and industry to discuss the latest advances in trustworthy and responsible AI solutions for information and knowledge management systems.
With the advent of multimodal LLMs and release of open-source multimodal models, the potential for multimodal search and recommendations has significantly increased. Multimodal systems offer a next-gen customer experience by creating a shared embedding space for text, images, audio, etc. These advancements enable more accurate, personalized recommendations, enhancing user satisfaction and engagement. This workshop on Multimodal Search and Recommendations explores the latest advancements, challenges, and applications of multimodal search and recommendations.
Recommender system (RecSys) plays important roles in helping users navigate, discover, and consume massive and highly-dynamic information. Today, many RecSys solutions deployed in the real world rely on categorical user-profiles and/or pre-calculated recommendation actions that stay static during a user session. However, recent trends suggest that RecSys need to model user intent in real time and constantly adapt to meet user needs at the moment or change user behavior in situ. There are three primary drivers for this emerging need of online adaptation. First, in order to meet the increasing demand for a better personalized experience, the personalization dimensions and space will grow larger and larger. It would not be feasible to pre-compute recommended actions for all personalization scenarios beyond a certain scale. Second, in many settings the system does not have user prior history to leverage. Estimating user intent in real time is the only feasible way to personalize. As various consumer privacy laws tighten, it is foreseeable that many businesses will reduce their reliance on static user profiles. Therefore, it makes the modeling of user intent in real time an important research topic. Third, a user's intent often changes within a session and between sessions, and user behavior could shift significantly during dramatic events. Therefore, it is important to investigate more on online and adaptive recommender system (OARS) that can adapt in real time to meet user needs and be robust against distribution shifts. Every year, the organizers survey the most important topics for OARS and propose a new workshop program. In light of the recent advancement of LLMs and foundation models in RecSys, in this new edition, we decide to formally add the new topic of foundation and LLM models in OARS. We will invite experts and papers in the field to facilitate its further advancement. Our workshop offers a focused discussion of the new study and application of OARS, and will bring together an interdisciplinary community of researchers and practitioners from both industry and academia to discuss on new topics in the area, grow a community, and push the direction forward.
Machine learning traditionally emphasizes developing models for given datasets, but real-world data is often messy, making model improvement insufficient for enhancing performance. Data-Centric AI (DCAI) is an emerging field that systematically improves datasets, leading to significant practical ML advancements. While experienced data scientists have manually refined datasets through trial-and-error and intuition, DCAI approaches data enhancement as a systematic engineering discipline. DCAI represents a shift from focusing on models to the underlying data used for training and evaluation. Despite the dominance of common model architectures and predictable scaling rules, building and using datasets remain labor-intensive and costly, lacking infrastructure and best practices. The DCAI movement aims to develop efficient, high-productivity open data engineering tools for modern ML systems. This workshop seeks to foster an interdisciplinary DCAI community to address practical data challenges, including data collection, generation, labeling, preprocessing, augmentation, quality evaluation, debt, and governance. By defining and shaping the DCAI movement, this workshop aims to influence the future of AI and ML, inviting interested parties to contribute through paper submissions.
Recommendation systems are used widely across many industries, such as e-commerce, multimedia content platforms, and social networks, to provide suggestions that users will most likely consume or connect, thus improving the user experience. This motivates people in industry and research organizations to focus on personalization and recommendation algorithms, resulting in many research papers. While academic research mostly focuses on the performance of recommendation algorithms in terms of ranking quality or accuracy, it often neglects key factors that impact how a recommendation system will perform in a real-world environment, including but not limited to business metric definition and evaluation, scalability, recommendation quality control, robustness, fairness, and resource limitations, such as computing and memory resources budgets, engineering workforce cost, etc. The gap in constraints and requirements between academic research and industry limits the broad applicability of many of academia's contributions to industrial recommendation systems. This workshop aspires to bridge this gap by bringing together researchers from both academia and industry. Its goal is to serve as a venue for industrial researchers to share practical insights and for academic researchers to become aware of the additional factors of algorithm adoption in real production systems.
The "Gen AI for E-commerce" workshop explores the role of Generative Artificial Intelligence in transforming e-commerce through enhanced user experience and operational efficiency. E-commerce companies grapple with multiple challenges such as lack of quality content for products, subpar user experience, sparse datasets etc. Gen AI offers significant potential to address these complexities. Yet, deploying these technologies at scale presents challenges such as hallucination in data, excessive costs, increased latency response, and limited generalization in sparse data environments. This workshop will bring together experts from academia and industry to discuss these challenges and opportunities, aiming to showcase case studies, breakthroughs, and insights into practical implementations of Gen AI in e-commerce.
Responsible AI is built upon a set of principles that prioritize fairness, transparency, accountability, and inclusivity in AI development and deployment. As AI systems become increasingly sophisticated, including the explosion of generative AI, there is a growing need to address ethical considerations and potential societal impacts of their uses. Knowledge graphs (KGs), as structured representations of information, can enhance generative AI performance by providing context, explaining outputs, and reducing biases, thereby offering a powerful framework to address the challenges of responsible AI. By leveraging semantic relationships and contextual understanding, KGs facilitate transparent decision-making, enabling stakeholders to trace and interpret the reasoning behind AI driven outcomes. Moreover, they provide a means to capture and manage diverse knowledge sources, supporting the development of fair and unbiased AI models. The workshop aims to investigate the role of knowledge graphs in promoting responsible AI principles and creating a cooperative space for researchers, practitioners, and policymakers to exchange insights and enhance their comprehension of KGs' impact on achieving responsible AI solutions. It seeks to facilitate collaboration and idea-sharing to advance the understanding of how KGs can contribute to responsible AI.
This workshop introduces generative AI applications for enterprise, with a focus on retrieval-augmented generation (RAG) systems. Generative AI is a field of artificial intelligence that can create new content and solve complex problems. RAG systems are a novel generative AI technique that combines information retrieval with text generation to generate rich and diverse responses. RAG systems can leverage enterprise data, which is often specific, structured, and dynamic, to provide customized solutions for various domains. However, enterprise data also poses challenges such as scalability, security, and data quality. This workshop convenes researchers and practitioners to explore RAG and other generative AI systems in real-world enterprise scenarios, fostering knowledge exchange, collaboration, and identification of future directions. Relevant to the CIKM community, the workshop intersects with core areas of data science and machine learning, offering potential benefits across various domains.
Graphs are powerful analytic tools for modeling adversarial activities across a wide range of domains and applications. Examples include identifying and responding to cybersecurity systems' threats and vulnerabilities, strengthening critical infrastructure's resilience and robustness, and combating covert illicit activities that span various domains like finance, communication, and transportation. With the rapid development of generative AI, the lifecycle and throughput of adversarial activities, such as generating attacks or synthesizing deceptive signals, have accelerated significantly. For instance, a malicious actor can generate a large number of malware variants to flood defense systems or create agents to disseminate misleading signals, obscuring their activities. Consequently, there is a pressing need for novel and effective technology to autonomously handle these adversarial activities and keep pace with the evolving threats. The purpose of this workshop is to provide a forum to discuss emerging research problems and novel approaches in graph analysis for modeling adversarial activities in the age of generative AI.
The field of information retrieval has significantly transformed with the integration of AI technologies. AI agents, especially those leveraging LLMs and vast computational power, have revolutionized information retrieval, processing, and presentation. LLM agents, with advanced memory, reasoning, and planning capabilities, can perform complex tasks, engage in coherent conversations, and provide personalized responses. Despite these advancements, challenges such as ensuring relevance and accuracy, mitigating biases, providing real-time responses, and maintaining data security remain. This workshop aims to explore these challenges, share innovative solutions, and discuss future directions. It will provide a platform to bring together researchers, practitioners to discuss the latest theoretical advancements and practical implementations of AI agents in information retrieval. Topics include AI in search, recommendation, and personalization systems. By gathering a diverse group of experts, the workshop seeks to deepen the understanding of AI agents in information retrieval, advance the field, and enhance its societal impact. Participants will gain insights into cutting-edge research, emerging trends, and foster knowledge exchange and collaboration within the community.