A quarter-century ago Web search stormed the world: within a few years the Web search box became a standard tool of daily life ready to satisfy informational, transactional, and navigational queries needed for some task completion. However, two recent ...
The real-world big data is largely unstructured, dynamic, and interconnected, in the form of natural language text. It is highly desirable to transform such massive unstructured data into structured knowledge. Many researchers and practitioners rely on ...
Current machine learning systems operate, almost exclusively, in a statistical, or model-blind mode, which entails severe theoretical limits on their power and performance. Such systems cannot reason about interventions and retrospection and, therefore, ...
At LinkedIn, we believe that having the right conversations with our members is key to unlocking economic opportunity for them. For us, these conversations are in a broader context than traditionally defined dialogues. A typical dialogue usually only ...
Advances in artificial intelligence have improved machine understanding of speech, images, and natural language. This in turn has allowed us to greatly enhance the intelligence of products such as Bing and Cortana. This keynote describes our continuing ...
In the age of network sciences and machine learning, efficient algorithms are now in higher demand more than ever before. Big Data fundamentally challenges the classical notion of efficient algorithms: Algorithms that used to be considered efficient, ...
Excellent recommendation system facilitates users retrieving contents they like and, what»s much more important - the contents they might like but they are not aware of yet. It will further increase the satisfaction of users and increase the retention ...
This paper evaluates two algorithms, BLIP and JLT, for creating differentially private data sketches of user profiles, in terms of their ability to protect a kNN collaborative filtering algorithm from an inference attack by third-parties. The ...
We investigate how Simpson»s paradox affects analysis of trends in social data. According to the paradox, the trends observed in data that has been aggregated over an entire population may be different from, and even opposite to, those of the underlying ...
Recent advances in deep learning and distributed representations of images and text have resulted in the emergence of several neural architectures for cross-modal retrieval tasks, such as searching collections of images in response to textual queries ...
Multinomial logistic regression is a classical technique for modeling how individuals choose an item from a finite set of alternatives. This methodology is a workhorse in both discrete choice theory and machine learning. However, it is unclear how to ...
The success of recommender systems often depends on their ability to understand and make use of the context of the recommendation request. Significant research has focused on how time, location, interfaces, and a plethora of other contextual features ...
We study ratio overall evaluation criteria (user behavior quality metrics) and, in particular, average values of non-user level metrics, that are widely used in A/B testing as an important part of modern Internet companies» evaluation instruments (e.g., ...
Label propagation is a powerful and flexible semi-supervised learning technique on graphs. Neural networks, on the other hand, have proven track records in many supervised learning tasks. In this work, we propose a training framework with a graph-...
Recommender systems are an integral part of many web applications. With increasingly larger user bases, scalability has become an important issue. Many of the most scalable algorithms with respect to both space and running times are based on locality ...
Max-sum diversity is a fundamental primitive for web search and data mining. For a given set S of n elements, it returns a subset of k«l n representatives maximizing the sum of their pairwise distances, where distance models dissimilarity. An important ...
On-demand ride-hailing platforms like Uber and Lyft are helping reshape urban transportation, by enabling car owners to become drivers for hire with minimal overhead. Although there are many studies that consider ride-hailing platforms holistically, ...
Although the word-popularity based negative sampler has shown superb performance in the skip-gram model, the theoretical motivation behind oversampling popular (non-observed) words as negative samples is still not well understood. In this paper, we ...
User preferences are usually dynamic in real-world recommender systems, and a user»s historical behavior records may not be equally important when predicting his/her future interests. Existing recommendation algorithms -- including both shallow and deep ...
The social media explosion has populated the Internet with a wealth of images. There are two existing paradigms for image retrieval: 1)content-based image retrieval (BIR), which has traditionally used visual features for similarity search (e.g., SIFT ...
This paper presents \textttConv-KNRM, a Convolutional Kernel-based Neural Ranking Model that models n-gram soft matches for ad-hoc search. Instead of exact matching query and document n-grams, \textttConv-KNRM uses Convolutional Neural Networks to ...
We present an analysis of the population dynamics and demographics of Amazon Mechanical Turk workers based on the results of the survey that we conducted over a period of 28 months, with more than 85K responses from 40K unique participants. The ...
Although some crowdsourcing aggregation models have been introduced to aggregate noisy crowd labels, these models mostly consider single-option (i.e. discrete) crowd labels as the input variables, and are not compatible with multi-option (i.e. non-...
Online audio advertising is a particular form of advertising used abundantly in online music streaming services. In these platforms, which tend to host tens of thousands of unique audio advertisements (ads), providing high quality ads ensures a better ...
Crowdsourcing has become a popular paradigm in data curation, annotation and evaluation for many artificial intelligence and information retrieval applications. Considerable efforts have gone into devising effective quality control mechanisms that ...
User profiling in social media has gained a lot of attention due to its varied set of applications in advertising, marketing, recruiting, and law enforcement. Among the various techniques for user modeling, there is fairly limited work on how to merge ...
We study the problem of automatically and efficiently generating itineraries for users who are on vacation. We focus on the common case, wherein the trip duration is more than a single day. Previous efficient algorithms based on greedy heuristics suffer ...
The constant growth of machine-generated mail, which today consists of more than 90% of non-spam mail traffic, is a major contributor toinformation overload in email, where users become overwhelmed with a flood of messages from commercial entities. A ...
Online A/B testing evaluates the impact of a new technology by running it in a real production environment and testing its performance on a subset of the users of the platform. It is a well-known practice to run a preliminary offline evaluation on ...
With mobile devices, users are taking ever-growing numbers of photos every day. These photos are uploaded to social sites such as Facebook and Flickr, often automatically. Yet, the portion of these uploaded photos being publicly shared is low, and on a ...
Questions on community question answering websites usually reflect one of two intents: learning information or starting a conversation. In this paper, we revisit this fundamental classification task of informational versus conversational questions, ...
Collaborative filtering techniques are a common approach for building recommendations, and have been widely applied in real recommender systems. However, collaborative filtering usually suffers from limited performance due to the sparsity of user-item ...
Crowdsourcing has become a popular method for collecting labeled training data. However, in many practical scenarios traditional labeling can be difficult for crowdworkers(for example, if the data is high-dimensional or unintuitive, or the labels are ...
Accurately predicting user preferences/ratings over items are crucial for many Internet applications, e.g., recommender systems, online advertising. In current main-stream algorithms regarding the rating prediction problem, discrete rating scores are ...
Content popularity prediction has been extensively studied due to its importance and interest for both users and hosts of social media sites like Facebook, Instagram, Twitter, and Pinterest. However, existing work mainly focuses on modeling popularity ...
Stock trend prediction plays a critical role in seeking maximized profit from the stock investment. However, precise trend prediction is very difficult since the highly volatile and non-stationary nature of the stock market. Exploding information on the ...
Attributed network embedding has been widely used in modeling real-world systems. The obtained low-dimensional vector representations of nodes preserve their proximity in terms of both network topology and node attributes, upon which different analysis ...
Neural IR models, such as DRMM and PACRR, have achieved strong results by successfully capturing relevance matching signals. We argue that the context of these matching signals is also important. Intuitively, when extracting, modeling, and combining ...
Recommendation based on heterogeneous information network(HIN) is attracting more and more attention due to its ability to emulate collaborative filtering, content-based filtering, context-aware recommendation and combinations of any of these ...
Given graphs with millions or billions of vertices and edges, how can we efficiently make inferences based on partial knowledge? Loopy Belief Propagation(LBP) is a graph inference algorithm widely used in various applications including social network ...
We propose a system called TwoFace to uncover crowdsourced review manipulators who target online review systems. A unique feature of TwoFace is its three-phase framework:(i) in the first phase, we intelligently sample actual evidence of manipulation(...
To ease comprehension of given time-stamped corpora, we extend topic models to handle both the specificity and temporality of topics; this is a significant advance over previous models which fail to provide both views simultaneously. Our proposed model ...
Online social networking sites are experimenting with the following crowd-powered procedure to reduce the spread of fake news and misinformation: whenever a user is exposed to a story through her feed, she can flag the story as misinformation and, if ...
Rating platforms enable large-scale collection of user opinion about items(e.g., products or other users). However, untrustworthy users give fraudulent ratings for excessive monetary gains. In this paper, we present REV2, a system to identify such ...
In this paper, we introduce a novel multimodal fashion search paradigm where e-commerce data is searched with a multimodal query composed of both an image and text. In this setting, the query image shows a fashion product that the user likes and the ...
People are shifting from traditional news sources to online news at an incredibly fast rate. However, the technology behind online news consumption promotes content that confirms the users» existing point of view. This phenomenon has led to polarization ...
The effectiveness of information retrieval systems heavily depends on a large number of hyperparameters that need to be tuned. Hyperparameters range from the choice of different system components, e.g., stopword lists, stemming methods, or retrieval ...
Link prediction targets to predict the future node interactions mainly based on the current network snapshot. It is a key step in understanding the formation and evolution of the underlying networks; and has practical implications in many real-world ...
Recently, dockless shared bike services have achieved great success and reinvented bike sharing business in China. When expanding bike sharing business into a new city, most start-ups always wish to find out how to cover the whole city with a suitable ...
Information networks are ubiquitous in many applications. A popular way to facilitate the information in a network is to embed the network structure into low-dimension spaces where each node is represented as a vector. The learned representations have ...
Large scale retrieval systems often employ cascaded ranking architectures, in which an initial set of candidate documents are iteratively refined and re-ranked by increasingly sophisticated and expensive ranking models. In this paper, we propose a ...
We examine approaches used for block-based inverted index compression, such as the OptPFOR mechanism, in which fixed-length blocks of postings data are compressed independently of each other. Building on previous work in which asymmetric numeral systems ...
In the past, hybrid recommender systems have shown the power of exploiting relationships amongst objects which directly or indirectly effect the recommendation task. However, the effect of all relations is not equal, and choosing their right balance for ...
We propose a new model toward improving the quality of image recommendations in social sharing communities like Pinterest, Flickr, and Instagram. Concretely, we propose Neural Personalized Ranking (NPR) -- a personalized pairwise ranking model over ...
The ability to discover all content relevant to an information domain has many applications, from helping in the understanding of humanitarian crises to countering human and arms trafficking. In such applications, time is of essence: it is crucial to ...
The objective in extreme multi-label learning is to build classifiers that can annotate a data point with the subset of relevant labels from an extremely large label set. Extreme classification has, thus far, only been studied in the context of ...
Nonnegative matrix factorization (NMF) has been successfully applied in different fields, such as text mining, image processing, and video analysis. NMF is the problem of determining two nonnegative low rank matrices U and V, for a given input matrix M, ...
Since the invention of word2vec, the skip-gram model has significantly advanced the research of network embedding, such as the recent emergence of the DeepWalk, LINE, PTE, and node2vec approaches. In this work, we show that all of the aforementioned ...
Learning node representations for networks has attracted much attention recently due to its effectiveness in a variety of applications. This paper focuses on learning node representations for heterogeneous star networks, which have a center node type ...
The emergence of online microfinancing platforms provides new opportunities for people to seek financial assistance from a large number of potential contributors. However, these platforms deal with a huge number of requests, making it hard for the ...
Vertices in a real-world social network can be grouped into densely connected communities that are sparsely connected to other groups. Moreover, these communities can be partitioned into successively more cohesive communities. Despite an ever-growing ...
Detecting depression is a key public health challenge, as almost 12% of all disabilities can be attributed to depression. Computational models for depression detection must prove not only that can they detect depression, but that they can do it early ...
Finding dense bipartite subgraphs and detecting the relations among them is an important problem for affiliation networks that arise in a range of domains, such as social network analysis, word-document clustering, the science of science, internet ...
Any learning algorithm for recommendation faces a fundamental trade-off between exploiting partial knowledge of a user»s interests to maximize satisfaction in the short term and discovering additional user interests to maximize satisfaction in the long ...
Friend and item recommendation on a social media site is an important task, which not only brings conveniences to users but also benefits platform providers. However, recommendation for newly launched social media sites is challenging because they often ...
Email messages have been an important mode of communication, not only for work, but also for social interactions and marketing. When messages have time sensitive information, it becomes relevant for the sender to know what is the expected time within ...
Connected Components is a fundamental graph mining problem that has been studied for the PRAM, MapReduce and BSP models. We present a simple CC algorithm for BSP that does not mutate the graph, converges in O(log n) supersteps and scales to graphs of ...
As online shopping becomes increasingly popular, users perform more product search to purchase items. Previous studies have investigated people's online shopping behaviours and ways to predict online purchases. However, from a user perspective, there ...
In this paper, we consider the problem of open information extraction (OIE) for extracting entity and relation level intermediate structures from sentences in open-domain. We focus on four types of valuable intermediate structures (Relation, Attribute, ...
Top-N sequential recommendation models each user as a sequence of items interacted in the past and aims to predict top-N ranked items that a user will likely interact in a »near future». The order of interaction implies that sequential patterns play an ...
Multidimensional data appear frequently in many web-related applications, e.g., product ratings, the bag-of-words representation of web pages, etc. Principal Component Analysis (PCA) has been widely used for discovering patterns in relationships among ...
The dominant neural architectures in question answer retrieval are based on recurrent or convolutional encoders configured with complex word matching layers. Given that recent architectural innovations are mostly new word interaction layers or attention-...
In online social networks people often express attitudes towards others, which forms massive sentiment links among users. Predicting the sign of sentiment links is a fundamental task in many areas such as personal advertising and public opinion ...
This paper studies the location-based web search and aims to build a unified processing paradigm for two purposes: (1) efficiently support each of the various types of location-based queries (kNN query, top-k spatial-textual query, etc.) on two major ...
A well-known challenge in learning from click data is its inherent bias and most notably position bias. Traditional click models aim to extract the ‹query, document› relevance and the estimated bias is usually discarded after relevance is extracted. In ...
In personalized recommendation, candidate generation plays an infrastructural role by retrieving candidates out of billions of items. During this process, substitutes and complements constitute two main classes of retrieved candidates: substitutable ...
With the advances in the development of mobile payments, a huge amount of payment data are collected by banks. User payment data offer a good dataset to depict customer behavior patterns. A comprehensive understanding of customers' purchase behavior is ...
When a message, such as a piece of news, spreads in social networks, how can we classify it into categories of interests, such as genuine or fake news? Classification of social media content is a fundamental task for social media mining, and most ...
Automatic relation extraction (E)for types of interest is of great importance for interpreting massive text corpora in an efficient manner. For example, we want to identify the relationship "president_of" between entities "Donald Trump" and "United ...
What are the intents or goals behind human interactions with image search engines? Knowing why people search for images is of major concern to Web image search engines because user satisfaction may vary as intent varies. Previous analyses of image ...
With the increasing demand for deeper understanding of users» preferences, recommender systems have gone beyond simple user-item filtering and are increasingly sophisticated, comprised of multiple components for analyzing and fusing diverse information. ...
Word evolution refers to the changing meanings and associations of words throughout time, as a byproduct of human language evolution. By studying word evolution, we can infer social trends and language constructs over different periods of human history. ...
Nowadays, it is a heated topic for many industries to build automatic question-answering (QA) systems. A key solution to these QA systems is to retrieve from a QA knowledge base the most similar question of a given question, which can be reformulated as ...
In E-commerce sites, there are platforms for users to pose product-related questions and experienced customers may provide answers voluntarily. Among the questions asked by users, a large proportion of them are yes-no questions reflecting that users ...
Deep neural networks have recently shown promise in the ad-hoc retrieval task. However, such models have often been based on one field of the document, for example considering document title only or document body only. Since in practice documents ...
Cold-start problem and recommendation efficiency have been regarded as two crucial challenges in the recommender system. In this paper, we propose a hashing based deep learning framework called Discrete Deep Learning (DDL), to map users and items to ...
The explosive popularity of e-commerce sites has reshaped users» shopping habits and an increasing number of users prefer to spend more time shopping online. This evolution allows e-commerce sites to observe rich data about users. The majority of ...
Predicting passenger pickup/dropoff demands based on historical mobility trips has been of great importance towards better vehicle distribution for the emerging mobility-on-demand (MOD) services. Prior works focused on predicting next-step passenger ...
This research presents a new set of techniques to deal with event mining from different text sources, a complex set of NLP tasks which aim to extract events of interest and their components including authors, targets, locations, and event categories. ...
Networks are ubiquitous in many high impact domains. Among the various aspects of network studies, connectivity is the one that plays important role in many applications (e.g., information dissemination, robustness analysis, community detection, etc.). ...
Typical information retrieval system evaluation requires expensive manually-collected relevance judgments of documents, which are used to rank retrieval systems. Due to the high cost associated with collecting relevance judgments and the ever-growing ...
In this paper, we propose three models for socio-political opinion polarity classification of microblog posts. Firstly, a novel probabilistic model, Joint-Entity-Sentiment-Topic (JEST) model, which captures opinions as a combination of the target entity,...
Social media and technology have drastically transformed the social and information networks around us. They have impacted how we communicate with others, search for information, and even how we express our personal opinions. Further, in this era of big ...
This proposal aims to study user engagement pattern and how different incentive mechanisms influence user behavior in online communities. Work in this proposal investigates the diverse behavior patterns that different individuals follow in various ...
Point-of-interest (POI) recommendation, which provides personalized recommendation of places to mobile users, is an important task in location-based social networks (LBSNs). Unlike traditional interest-oriented merchandise recommendation, POI ...
We demonstrate \perco, a distributed system for graph pattern discovery in dynamic graphs. In contrast to conventional mining systems, Percolator advocates efficient pattern mining schemes that (1) support pattern detection with keywords; (2) integrate ...
User expectations of web search are changing. They are expecting search engines to answer questions, to be more conversational, and to offer means to complete tasks on their behalf. At the same time, to increase the breadth of tasks that personal ...
Urban data (e.g., real estate data, crime data) often have multiple attributes which are highly geography-related. With the scale of data increases, directly visualizing millions of individual data points on top of a map would overwhelm users' ...
In recent years, enterprise group chat collaboration tools, such as Slack, IBM»s Watson Workspace and Microsoft Teams, have presented unprecedented growth. With all the potential benefits of these tools - productivity increase and improved group ...
Starting with the earliest studies showing that the spread of new trends, information, and innovations is closely related to the social influence exerted on people by their social networks, the research on social influence theory took off, providing ...
The concern for privacy is real for any research that uses user data. Information Retrieval (IR) is not an exception. Many IR algorithms and applications require the use of users' personal information, contextual information and other sensitive and ...
Machine learning plays a role in many aspects of modern IR systems, and deep learning is applied in all of them. The fast pace of modern-day research has given rise to many approaches to many IR problems. The amount of information available can be ...
User engagement plays a central role in companies operating online services, such as search engines, news portals, e-commerce sites, and social networks. A main challenge is to leverage collected knowledge about the daily online behavior of millions of ...
Teams are increasingly indispensable to achievements in any organization. Despite the organizations' substantial dependency on teams, fundamental knowledge about the conduct of team-enabled operations is lacking, especially at the social, cognitive and ...
Online social data like user-generated content, expressed or implicit relations among people, and behavioral traces are at the core of many popular web applications and platforms, driving the research agenda of researchers in both academia and industry. ...
Data and analytics have been part of the sports industry from as early as the 1870s, when the first boxscore in baseball was recorded. However, it is only recently that advanced data mining and machine learning techniques have been utilized for ...
Knowledge graphs have become an increasingly crucial component in machine intelligence systems, powering ubiquitous digital assistants and inspiring several large scale academic projects across the globe. Our tutorial explains why knowledge graphs are ...
During large-scale emergencies such as natural and man-made disasters, a massive amount of information is posted by the public in social media. Collecting, aggregating, and presenting this information to stakeholders can be extremely challenging, ...
The first International Workshop on Heterogeneous Networks Analysis and Mining is held in Los Angeles, California, USA on February 9th, 2018 and is co-located with the 11th ACM International Conference on Web Search and Data Mining. The goal of this ...
While users interact with online services(e.g. search engines, recommender systems, conversational agents), they leave behind fine grained traces of interaction patterns. The ability to understand user behavior, record and interpret user interaction ...
Misinformation and misbehavior mining on the web(MIS2) workshop is held in Los Angeles, California, USA on February 9, 2018, and co-located with the 11th ACM International Conference on Web Search and Data Mining(WSDM 2018). Web is a dynamic ecosystem ...
The 1st International Workshop on Two-sided Marketplace Optimization: Search, Pricing, Matching & Growth(TSMO) will be held in Los Angeles, California, USA on February 9th, 2018, co-located with the 11th ACM International Conference on Web Search and ...
Networks are natural analytic tools in modeling adversarial activities(e.g., human trafficking, illicit drug production, terrorist financial transaction) using different intelligence data sources. However, such activities are often covert and embedded ...
Recommendation system has became an important component in many real applications, ranging from e-commerce, music app to video-sharing site and on-line book store. The key of a successful recommendation system lies in the accurate user/item profiling. ...