WSDM '18: Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining

Full Citation in the ACM Digital Library

SESSION: Keynote Talks

A Call to Arms: Embrace Assistive AI Systems!

A quarter-century ago Web search stormed the world: within a few years the Web search box became a standard tool of daily life ready to satisfy informational, transactional, and navigational queries needed for some task completion. However, two recent ...

    On the Power of Massive Text Data

    The real-world big data is largely unstructured, dynamic, and interconnected, in the form of natural language text. It is highly desirable to transform such massive unstructured data into structured knowledge. Many researchers and practitioners rely on ...

    Theoretical Impediments to Machine Learning With Seven Sparks from the Causal Revolution

    Current machine learning systems operate, almost exclusively, in a statistical, or model-blind mode, which entails severe theoretical limits on their power and performance. Such systems cannot reason about interventions and retrospection and, therefore, ...

    Conversations, Machine Learning and Privacy: LinkedIn's Path Towards Transforming Interaction with Its Members

    At LinkedIn, we believe that having the right conversations with our members is key to unlocking economic opportunity for them. For us, these conversations are in a broader context than traditionally defined dialogues. A typical dialogue usually only ...

    From Search to Research: Direct Answers, Perspectives and Dialog

    Advances in artificial intelligence have improved machine understanding of speech, images, and natural language. This in turn has allowed us to greatly enhance the intelligence of products such as Bing and Cortana. This keynote describes our continuing ...

    Scalable Algorithms in the Age of Big Data and Network Sciences: Characterization, Primitives, and Techniques

    In the age of network sciences and machine learning, efficient algorithms are now in higher demand more than ever before. Big Data fundamentally challenges the classical notion of efficient algorithms: Algorithms that used to be considered efficient, ...

    SESSION: WSDM Cup 2018

    WSDM Cup 2018: Music Recommendation and Churn Prediction

    Excellent recommendation system facilitates users retrieving contents they like and, what»s much more important - the contents they might like but they are not aware of yet. It will further increase the satisfaction of users and increase the retention ...

    SESSION: Technical Presentations

    Performance Analysis of a Privacy Constrained kNN Recommendation Using Data Sketches

    This paper evaluates two algorithms, BLIP and JLT, for creating differentially private data sketches of user profiles, in terms of their ability to protect a kNN collaborative filtering algorithm from an inference attack by third-parties. The ...

    Can you Trust the Trend?: Discovering Simpson's Paradoxes in Social Data

    We investigate how Simpson»s paradox affects analysis of trends in social data. According to the paradox, the trends observed in data that has been aggregated over an entire population may be different from, and even opposite to, those of the underlying ...

    Deep Neural Architecture for Multi-Modal Retrieval based on Joint Embedding Space for Text and Images

    Recent advances in deep learning and distributed representations of images and text have resulted in the emergence of several neural architectures for cross-modal retrieval tasks, such as searching collections of images in response to textual queries ...

    A Discrete Choice Model for Subset Selection

    Multinomial logistic regression is a classical technique for modeling how individuals choose an item from a finite set of alternatives. This methodology is a workhorse in both discrete choice theory and machine learning. However, it is unclear how to ...

    Latent Cross: Making Use of Context in Recurrent Recommender Systems

    The success of recommender systems often depends on their ability to understand and make use of the context of the recommendation request. Significant research has focused on how time, location, interfaces, and a plethora of other contextual features ...

    Consistent Transformation of Ratio Metrics for Efficient Online Controlled Experiments

    We study ratio overall evaluation criteria (user behavior quality metrics) and, in particular, average values of non-user level metrics, that are widely used in A/B testing as an important part of modern Internet companies» evaluation instruments (e.g., ...

    Neural Graph Learning: Training Neural Networks Using Graphs

    Label propagation is a powerful and flexible semi-supervised learning technique on graphs. Neural networks, on the other hand, have proven track records in many supervised learning tasks. In this work, we propose a training framework with a graph-...

    Sketch 'Em All: Fast Approximate Similarity Search for Dynamic Data Streams

    Recommender systems are an integral part of many web applications. With increasingly larger user bases, scalability has become an important issue. Many of the most scalable algorithms with respect to both space and running times are based on locality ...

    Fast Coreset-based Diversity Maximization under Matroid Constraints

    Max-sum diversity is a fundamental primitive for web search and data mining. For a given set S of n elements, it returns a subset of k«l n representatives maximizing the sum of their pairwise distances, where distance models dissimilarity. An important ...

    Putting Data in the Driver's Seat: Optimizing Earnings for On-Demand Ride-Hailing

    On-demand ride-hailing platforms like Uber and Lyft are helping reshape urban transportation, by enabling car owners to become drivers for hire with minimal overhead. Although there are many studies that consider ride-hailing platforms holistically, ...

    Improving Negative Sampling for Word Representation using Self-embedded Features

    Although the word-popularity based negative sampler has shown superb performance in the skip-gram model, the theoretical motivation behind oversampling popular (non-observed) words as negative samples is still not well understood. In this paper, we ...

    Sequential Recommendation with User Memory Networks

    User preferences are usually dynamic in real-world recommender systems, and a user»s historical behavior records may not be equally important when predicting his/her future interests. Existing recommendation algorithms -- including both shallow and deep ...

    VISIR: Visual and Semantic Image Label Refinement

    The social media explosion has populated the Internet with a wealth of images. There are two existing paradigms for image retrieval: 1)content-based image retrieval (BIR), which has traditionally used visual features for similarity search (e.g., SIFT ...

    Convolutional Neural Networks for Soft-Matching N-Grams in Ad-hoc Search

    This paper presents \textttConv-KNRM, a Convolutional Kernel-based Neural Ranking Model that models n-gram soft matches for ad-hoc search. Instead of exact matching query and document n-grams, \textttConv-KNRM uses Convolutional Neural Networks to ...

    Demographics and Dynamics of Mechanical Turk Workers

    We present an analysis of the population dynamics and demographics of Amazon Mechanical Turk workers based on the results of the survey that we conducted over a period of 28 months, with more than 85K responses from 40K unique participants. The ...

    Joint Generative-Discriminative Aggregation Model for Multi-Option Crowd Labels

    Although some crowdsourcing aggregation models have been introduced to aggregate noisy crowd labels, these models mostly consider single-option (i.e. discrete) crowd labels as the input variables, and are not compatible with multi-option (i.e. non-...

    Predicting Audio Advertisement Quality

    Online audio advertising is a particular form of advertising used abundantly in online music streaming services. In these platforms, which tend to host tens of thousands of unique audio advertisements (ads), providing high quality ads ensures a better ...

    Cognitive Biases in Crowdsourcing

    Crowdsourcing has become a popular paradigm in data curation, annotation and evaluation for many artificial intelligence and information retrieval applications. Considerable efforts have gone into devising effective quality control mechanisms that ...

    User Profiling through Deep Multimodal Fusion

    User profiling in social media has gained a lot of attention due to its varied set of applications in advertising, marketing, recruiting, and law enforcement. Among the various techniques for user modeling, there is fairly limited work on how to merge ...

    Orienteering Algorithms for Generating Travel Itineraries

    We study the problem of automatically and efficiently generating itineraries for users who are on vacation. We focus on the common case, wherein the trip duration is more than a single day. Previous efficient algorithms based on greedy heuristics suffer ...

    Unsubscription: A Simple Way to Ease Overload in Email

    The constant growth of machine-generated mail, which today consists of more than 90% of non-spam mail traffic, is a major contributor toinformation overload in email, where users become overwhelmed with a flood of messages from commercial entities. A ...

    Offline A/B Testing for Recommender Systems

    Online A/B testing evaluates the impact of a new technology by running it in a real production environment and testing its performance on a subset of the users of the platform. It is a well-known practice to run a preliminary offline evaluation on ...

    Care to Share?: Learning to Rank Personal Photos for Public Sharing

    With mobile devices, users are taking ever-growing numbers of photos every day. These photos are uploaded to social sites such as Facebook and Flickr, often automatically. Yet, the portion of these uploaded photos being publicly shared is low, and on a ...

    Identifying Informational vs. Conversational Questions on Community Question Answering Archives

    Questions on community question answering websites usually reflect one of two intents: learning information or starting a conversation. In this paper, we revisit this fundamental classification task of informational versus conversational questions, ...

    Robust Transfer Learning for Cross-domain Collaborative Filtering Using Multiple Rating Patterns Approximation

    Collaborative filtering techniques are a common approach for building recommendations, and have been widely applied in real recommender systems. However, collaborative filtering usually suffers from limited performance due to the sparsity of user-item ...

    Ballpark Crowdsourcing: The Wisdom of Rough Group Comparisons

    Crowdsourcing has become a popular method for collecting labeled training data. However, in many practical scenarios traditional labeling can be difficult for crowdworkers(for example, if the data is high-dimensional or unintuitive, or the labels are ...

    Collaborative Filtering via Additive Ordinal Regression

    Accurately predicting user preferences/ratings over items are crucial for many Internet applications, e.g., recommender systems, online advertising. In current main-stream algorithms regarding the rating prediction problem, discrete rating scores are ...

    Who Will Share My Image?: Predicting the Content Diffusion Path in Online Social Networks

    Content popularity prediction has been extensively studied due to its importance and interest for both users and hosts of social media sites like Facebook, Instagram, Twitter, and Pinterest. However, existing work mainly focuses on modeling popularity ...

    Listening to Chaotic Whispers: A Deep Learning Framework for News-oriented Stock Trend Prediction

    Stock trend prediction plays a critical role in seeking maximized profit from the stock investment. However, precise trend prediction is very difficult since the highly volatile and non-stationary nature of the stock market. Exploding information on the ...

    Exploring Expert Cognition for Attributed Network Embedding

    Attributed network embedding has been widely used in modeling real-world systems. The obtained low-dimensional vector representations of nodes preserve their proximity in terms of both network topology and node attributes, upon which different analysis ...

    Co-PACRR: A Context-Aware Neural IR Model for Ad-hoc Retrieval

    Neural IR models, such as DRMM and PACRR, have achieved strong results by successfully capturing relevance matching signals. We argue that the context of these matching signals is also important. Intuitively, when extracting, modeling, and combining ...

    Recommendation in Heterogeneous Information Networks Based on Generalized Random Walk Model and Bayesian Personalized Ranking

    Recommendation based on heterogeneous information network(HIN) is attracting more and more attention due to its ability to emulate collaborative filtering, content-based filtering, context-aware recommendation and combinations of any of these ...

    Fast and Scalable Distributed Loopy Belief Propagation on Real-World Graphs

    Given graphs with millions or billions of vertices and edges, how can we efficiently make inferences based on partial knowledge? Loopy Belief Propagation(LBP) is a graph inference algorithm widely used in various applications including social network ...

    Combating Crowdsourced Review Manipulators: A Neighborhood-Based Approach

    We propose a system called TwoFace to uncover crowdsourced review manipulators who target online review systems. A unique feature of TwoFace is its three-phase framework:(i) in the first phase, we intelligently sample actual evidence of manipulation(...

    Topic Chronicle Forest for Topic Discovery and Tracking

    To ease comprehension of given time-stamped corpora, we extend topic models to handle both the specificity and temporality of topics; this is a significant advance over previous models which fail to provide both views simultaneously. Our proposed model ...

    Leveraging the Crowd to Detect and Reduce the Spread of Fake News and Misinformation

    Online social networking sites are experimenting with the following crowd-powered procedure to reduce the spread of fake news and misinformation: whenever a user is exposed to a story through her feed, she can flag the story as misinformation and, if ...

    REV2: Fraudulent User Prediction in Rating Platforms

    Rating platforms enable large-scale collection of user opinion about items(e.g., products or other users). However, untrustworthy users give fraudulent ratings for excessive monetary gains. In this paper, we present REV2, a system to identify such ...

    Web Search of Fashion Items with Multimodal Querying

    In this paper, we introduce a novel multimodal fashion search paradigm where e-commerce data is searched with a multimodal query composed of both an image and text. In this setting, the query image shows a fashion product that the user likes and the ...

    Joint Non-negative Matrix Factorization for Learning Ideological Leaning on Twitter

    People are shifting from traditional news sources to online news at an incredibly fast rate. However, the technology behind online news consumption promotes content that confirms the users» existing point of view. This phenomenon has led to polarization ...

    Bayesian Optimization for Optimizing Retrieval Systems

    The effectiveness of information retrieval systems heavily depends on a large number of hyperparameters that need to be tuned. Hyperparameters range from the choice of different system components, e.g., stopword lists, stemming methods, or retrieval ...

    Streaming Link Prediction on Dynamic Attributed Networks

    Link prediction targets to predict the future node interactions mainly based on the current network snapshot. It is a key step in understanding the formation and evolution of the underlying networks; and has practical implications in many real-world ...

    Inferring Dockless Shared Bike Distribution in New Cities

    Recently, dockless shared bike services have achieved great success and reinvented bike sharing business in China. When expanding bike sharing business into a new city, most start-ups always wish to find out how to cover the whole city with a suitable ...

    Multi-Dimensional Network Embedding with Hierarchical Structure

    Information networks are ubiquitous in many applications. A popular way to facilitate the information in a network is to embed the network structure into low-dimension spaces where each node is represented as a vector. The learned representations have ...

    Query Driven Algorithm Selection in Early Stage Retrieval

    Large scale retrieval systems often employ cascaded ranking architectures, in which an initial set of candidate documents are iteratively refined and re-ranked by increasingly sophisticated and expensive ranking models. In this paper, we propose a ...

    Index Compression Using Byte-Aligned ANS Coding and Two-Dimensional Contexts

    We examine approaches used for block-based inverted index compression, such as the OptPFOR mechanism, in which fixed-length blocks of postings data are compressed independently of each other. Building on previous work in which asymmetric numeral systems ...

    Fusing Diversity in Recommendations in Heterogeneous Information Networks

    In the past, hybrid recommender systems have shown the power of exploiting relationships amongst objects which directly or indirectly effect the recommendation task. However, the effect of all relations is not equal, and choosing their right balance for ...

    Neural Personalized Ranking for Image Recommendation

    We propose a new model toward improving the quality of image recommendations in social sharing communities like Pinterest, Flickr, and Instagram. Concretely, we propose Neural Personalized Ranking (NPR) -- a personalized pairwise ranking model over ...

    Learning to Discover Domain-Specific Web Content

    The ability to discover all content relevant to an information domain has many applications, from helping in the understanding of humanitarian crises to countering human and arms trafficking. In such applications, time is of essence: it is crucial to ...

    Extreme Multi-label Learning with Label Features for Warm-start Tagging, Ranking & Recommendation

    The objective in extreme multi-label learning is to build classifiers that can annotate a data point with the subset of relevant labels from an extremely large label set. Extreme classification has, thus far, only been studied in the context of ...

    DSANLS: Accelerating Distributed Nonnegative Matrix Factorization via Sketching

    Nonnegative matrix factorization (NMF) has been successfully applied in different fields, such as text mining, image processing, and video analysis. NMF is the problem of determining two nonnegative low rank matrices U and V, for a given input matrix M, ...

    Network Embedding as Matrix Factorization: Unifying DeepWalk, LINE, PTE, and node2vec

    Since the invention of word2vec, the skip-gram model has significantly advanced the research of network embedding, such as the recent emergence of the DeepWalk, LINE, PTE, and node2vec approaches. In this work, we show that all of the aforementioned ...

    Curriculum Learning for Heterogeneous Star Network Embedding via Deep Reinforcement Learning

    Learning node representations for networks has attracted much attention recently due to its effectiveness in a variety of applications. This paper focuses on learning node representations for heterogeneous star networks, which have a center node type ...

    Leveraging Implicit Contribution Amounts to Facilitate Microfinancing Requests

    The emergence of online microfinancing platforms provides new opportunities for people to seek financial assistance from a large number of potential contributors. However, these platforms deal with a huge number of requests, making it hard for the ...

    FACH: Fast Algorithm for Detecting Cohesive Hierarchies of Communities in Large Networks

    Vertices in a real-world social network can be grouped into densely connected communities that are sparsely connected to other groups. Moreover, these communities can be partitioned into successively more cohesive communities. Despite an ever-growing ...

    Measuring the Latency of Depression Detection in Social Media

    Detecting depression is a key public health challenge, as almost 12% of all disabilities can be attributed to depression. Computational models for depression detection must prove not only that can they detect depression, but that they can do it early ...

    Peeling Bipartite Networks for Dense Subgraph Discovery

    Finding dense bipartite subgraphs and detecting the relations among them is an important problem for affiliation networks that arise in a range of domains, such as social network analysis, word-document clustering, the science of science, internet ...

    Short-Term Satisfaction and Long-Term Coverage: Understanding How Users Tolerate Algorithmic Exploration

    Any learning algorithm for recommendation faces a fundamental trade-off between exploiting partial knowledge of a user»s interests to maximize satisfaction in the short term and discovering additional user interests to maximize satisfaction in the long ...

    CrossFire: Cross Media Joint Friend and Item Recommendations

    Friend and item recommendation on a social media site is an important task, which not only brings conveniences to users but also benefits platform providers. However, recommendation for newly launched social media sites is challenging because they often ...

    Modeling Time to Open of Emails with a Latent State for User Engagement Level

    Email messages have been an important mode of communication, not only for work, but also for social interactions and marketing. When messages have time sensitive information, it becomes relevant for the sender to know what is the expected time within ...

    Shortcutting Label Propagation for Distributed Connected Components

    Connected Components is a fundamental graph mining problem that has been studied for the PRAM, MapReduce and BSP models. We present a simple CC algorithm for BSP that does not mutate the graph, converges in O(log n) supersteps and scales to graphs of ...

    User Intent, Behaviour, and Perceived Satisfaction in Product Search

    As online shopping becomes increasingly popular, users perform more product search to purchase items. Previous studies have investigated people's online shopping behaviours and ways to predict online purchases. However, from a user perspective, there ...

    Logician: A Unified End-to-End Neural Approach for Open-Domain Information Extraction

    In this paper, we consider the problem of open information extraction (OIE) for extracting entity and relation level intermediate structures from sentences in open-domain. We focus on four types of valuable intermediate structures (Relation, Attribute, ...

    Personalized Top-N Sequential Recommendation via Convolutional Sequence Embedding

    Top-N sequential recommendation models each user as a sequence of items interacted in the past and aims to predict top-N ranked items that a user will likely interact in a »near future». The order of interaction implies that sequential patterns play an ...

    sSketch: A Scalable Sketching Technique for PCA in the Cloud

    Multidimensional data appear frequently in many web-related applications, e.g., product ratings, the bag-of-words representation of web pages, etc. Principal Component Analysis (PCA) has been widely used for discovering patterns in relationships among ...

    Hyperbolic Representation Learning for Fast and Efficient Neural Question Answering

    The dominant neural architectures in question answer retrieval are based on recurrent or convolutional encoders configured with complex word matching layers. Given that recent architectural innovations are mostly new word interaction layers or attention-...

    SHINE: Signed Heterogeneous Information Network Embedding for Sentiment Link Prediction

    In online social networks people often express attitudes towards others, which forms massive sentiment links among users. Predicting the sign of sentiment links is a fundamental task in many areas such as personal advertising and public opinion ...

    A Unified Processing Paradigm for Interactive Location-based Web Search

    This paper studies the location-based web search and aims to build a unified processing paradigm for two purposes: (1) efficiently support each of the various types of location-based queries (kNN query, top-k spatial-textual query, etc.) on two major ...

    Position Bias Estimation for Unbiased Learning to Rank in Personal Search

    A well-known challenge in learning from click data is its inherent bias and most notably position bias. Traditional click models aim to extract the ‹query, document› relevance and the estimated bias is usually discarded after relevance is extracted. In ...

    A Path-constrained Framework for Discriminating Substitutable and Complementary Products in E-commerce

    In personalized recommendation, candidate generation plays an infrastructural role by retrieving candidates out of billions of items. During this process, substitutes and complements constitute two main classes of retrieved candidates: substitutable ...

    Customer Purchase Behavior Prediction from Payment Datasets

    With the advances in the development of mobile payments, a huge amount of payment data are collected by banks. User payment data offer a good dataset to depict customer behavior patterns. A comprehensive understanding of customers' purchase behavior is ...

    Tracing Fake-News Footprints: Characterizing Social Media Messages by How They Propagate

    When a message, such as a piece of news, spreads in social networks, how can we classify it into categories of interests, such as genuine or fake news? Classification of social media content is a fundamental task for social media mining, and most ...

    Indirect Supervision for Relation Extraction using Question-Answer Pairs

    Automatic relation extraction (E)for types of interest is of great importance for interpreting massive text corpora in an efficient manner. For example, we want to identify the relationship "president_of" between entities "Donald Trump" and "United ...

    Why People Search for Images using Web Search Engines

    What are the intents or goals behind human interactions with image search engines? Knowing why people search for images is of major concern to Web image search engines because user satisfaction may vary as intent varies. Previous analyses of image ...

    OpenRec: A Modular Framework for Extensible and Adaptable Recommendation Algorithms

    With the increasing demand for deeper understanding of users» preferences, recommender systems have gone beyond simple user-item filtering and are increasingly sophisticated, comprised of multiple components for analyzing and fusing diverse information. ...

    Dynamic Word Embeddings for Evolving Semantic Discovery

    Word evolution refers to the changing meanings and associations of words throughout time, as a byproduct of human language evolution. By studying word evolution, we can infer social trends and language constructs over different periods of human history. ...

    Modelling Domain Relationships for Transfer Learning on Retrieval-based Question Answering Systems in E-commerce

    Nowadays, it is a heated topic for many industries to build automatic question-answering (QA) systems. A key solution to these QA systems is to retrieve from a QA knowledge base the most similar question of a given question, which can be reformulated as ...

    Review-Aware Answer Prediction for Product-Related Questions Incorporating Aspects

    In E-commerce sites, there are platforms for users to pose product-related questions and experienced customers may provide answers voluntarily. Among the questions asked by users, a large proportion of them are yes-no questions reflecting that users ...

    Neural Ranking Models with Multiple Document Fields

    Deep neural networks have recently shown promise in the ad-hoc retrieval task. However, such models have often been based on one field of the document, for example considering document title only or document body only. Since in practice documents ...

    Discrete Deep Learning for Fast Content-Aware Recommendation

    Cold-start problem and recommendation efficiency have been regarded as two crucial challenges in the recommender system. In this paper, we propose a hashing based deep learning framework called Discrete Deep Learning (DDL), to map users and items to ...

    Micro Behaviors: A New Perspective in E-commerce Recommender Systems

    The explosive popularity of e-commerce sites has reshaped users» shopping habits and an increasing number of users prefer to spend more time shopping online. This evolution allows e-commerce sites to observe rich data about users. The majority of ...

    Predicting Multi-step Citywide Passenger Demands Using Attention-based Neural Networks

    Predicting passenger pickup/dropoff demands based on historical mobility trips has been of great importance towards better vehicle distribution for the emerging mobility-on-demand (MOD) services. Prior works focused on predicting next-step passenger ...

    SESSION: Doctoral Presentations

    Event Mining over Distributed Text Streams

    This research presents a new set of techniques to deal with event mining from different text sources, a complex set of NLP tasks which aim to extract events of interest and their components including authors, targets, locations, and event categories. ...

    Connectivity in Complex Networks: Measures, Inference and Optimization

    Networks are ubiquitous in many high impact domains. Among the various aspects of network studies, connectivity is the one that plays important role in many applications (e.g., information dissemination, robustness analysis, community detection, etc.). ...

    Automatic Ranking of Information Retrieval Systems

    Typical information retrieval system evaluation requires expensive manually-collected relevance judgments of documents, which are used to rank retrieval systems. Due to the high cost associated with collecting relevance judgments and the ever-growing ...

    Mining Twitter for Fine-Grained Political Opinion Polarity Classification, Ideology Detection and Sarcasm Detection

    In this paper, we propose three models for socio-political opinion polarity classification of microblog posts. Firstly, a novel probabilistic model, Joint-Entity-Sentiment-Topic (JEST) model, which captures opinions as a combination of the target entity,...

    Beyond Who and What: Data Driven Approaches for User Characterization

    Social media and technology have drastically transformed the social and information networks around us. They have impacted how we communicate with others, search for information, and even how we express our personal opinions. Further, in this era of big ...

    Engagement and Incentives in Online Community: Observational Data, Prediction Models, and Field Experiments

    This proposal aims to study user engagement pattern and how different incentive mechanisms influence user behavior in online communities. Work in this proposal investigates the diverse behavior patterns that different individuals follow in various ...

    Exploiting Human Mobility Patterns for Point-of-Interest Recommendation

    Point-of-interest (POI) recommendation, which provides personalized recommendation of places to mobile users, is an important task in location-based social networks (LBSNs). Unlike traditional interest-oriented merchandise recommendation, POI ...

    DEMONSTRATION SESSION: Demonstrations

    Percolator: Scalable Pattern Discovery in Dynamic Graphs

    We demonstrate \perco, a distributed system for graph pattern discovery in dynamic graphs. In contrast to conventional mining systems, Percolator advocates efficient pattern mining schemes that (1) support pattern detection with keywords; (2) integrate ...

    Conversational Semantic Search: Looking Beyond Web Search, Q&A and Dialog Systems

    User expectations of web search are changing. They are expecting search engines to answer questions, to be more conversational, and to offer means to complete tasks on their behalf. At the same time, to increase the breadth of tasks that personal ...

    Supporting Large-scale Geographical Visualization in a Multi-granularity Way

    Urban data (e.g., real estate data, crime data) often have multiple attributes which are highly geography-related. With the scale of data increases, directly visualizing millions of individual data points on top of a map would overwhelm users' ...

    Collabot: Personalized Group Chat Summarization

    In recent years, enterprise group chat collaboration tools, such as Slack, IBM»s Watson Workspace and Microsoft Teams, have presented unprecedented growth. With all the potential benefits of these tools - productivity increase and improved group ...

    SESSION: Tutorials

    Influence Maximization in Online Social Networks

    Starting with the earliest studies showing that the spread of new trends, information, and innovations is closely related to the social influence exerted on people by their social networks, the research on social influence theory took off, providing ...

    Differential Privacy for Information Retrieval

    The concern for privacy is real for any research that uses user data. Information Retrieval (IR) is not an exception. Many IR algorithms and applications require the use of users' personal information, contextual information and other sensitive and ...

    Neural Networks for Information Retrieval

    Machine learning plays a role in many aspects of modern IR systems, and deep learning is applied in all of them. The fast pace of modern-day research has given rise to many approaches to many IR problems. The amount of information available can be ...

    Tutorial on Metrics of User Engagement: Applications to News, Search and E-Commerce

    User engagement plays a central role in companies operating online services, such as search engines, news portals, e-commerce sites, and social networks. A main challenge is to leverage collected knowledge about the daily online behavior of millions of ...

    Network Science of Teams: Characterization, Prediction, and Optimization

    Teams are increasingly indispensable to achievements in any organization. Despite the organizations' substantial dependency on teams, fundamental knowledge about the conduct of team-enabled operations is lacking, especially at the social, cognitive and ...

    A Critical Review of Online Social Data: Biases, Methodological Pitfalls, and Ethical Boundaries

    Online social data like user-generated content, expressed or implicit relations among people, and behavioral traces are at the core of many popular web applications and platforms, driving the research agenda of researchers in both academia and industry. ...

    Athlytics: Winning in Sports with Data

    Data and analytics have been part of the sports industry from as early as the 1870s, when the first boxscore in baseball was recorded. However, it is only recently that advanced data mining and machine learning techniques have been utilized for ...

    Mining Knowledge Graphs From Text

    Knowledge graphs have become an increasingly crucial component in machine intelligence systems, powering ubiquitous digital assistants and inspiring several large scale academic projects across the globe. Our tutorial explains why knowledge graphs are ...

    SESSION: Workshops

    The 5th International Workshop on Social Web for Disaster Management(SWDM'18): Collective Sensing, Trust, and Resilience in Global Crises

    During large-scale emergencies such as natural and man-made disasters, a massive amount of information is posted by the public in social media. Collecting, aggregating, and presenting this information to stakeholders can be extremely challenging, ...

    First Workshop on Knowledge Base Construction, Mining and Reasoning

    HeteroNAM: International Workshop on Heterogeneous Networks Analysis and Mining

    The first International Workshop on Heterogeneous Networks Analysis and Mining is held in Los Angeles, California, USA on February 9th, 2018 and is co-located with the 11th ACM International Conference on Web Search and Data Mining. The goal of this ...

    LearnIR: WSDM 2018 Workshop on Learning from User Interactions

    While users interact with online services(e.g. search engines, recommender systems, conversational agents), they leave behind fine grained traces of interaction patterns. The ability to understand user behavior, record and interpret user interaction ...

    MIS2: Misinformation and Misbehavior Mining on the Web

    Misinformation and misbehavior mining on the web(MIS2) workshop is held in Los Angeles, California, USA on February 9, 2018, and co-located with the 11th ACM International Conference on Web Search and Data Mining(WSDM 2018). Web is a dynamic ecosystem ...

    Workshop on Two-sided Marketplace Optimization: Search,Pricing, Matching & Growth

    The 1st International Workshop on Two-sided Marketplace Optimization: Search, Pricing, Matching & Growth(TSMO) will be held in Los Angeles, California, USA on February 9th, 2018, co-located with the 11th ACM International Conference on Web Search and ...

    GTA3 2018: Workshop on Graph Techniques for Adversarial Activity Analytics

    Networks are natural analytic tools in modeling adversarial activities(e.g., human trafficking, illicit drug production, terrorist financial transaction) using different intelligence data sources. However, such activities are often covert and embedded ...

    IFUP: Workshop on Multi-dimensional Information Fusion for User Modeling and Personalization

    Recommendation system has became an important component in many real applications, ranging from e-commerce, music app to video-sharing site and on-line book store. The key of a successful recommendation system lies in the accurate user/item profiling. ...