OASIS '22: Proceedings of the 2022 Workshop on Open Challenges in Online Social Networks

Full Citation in the ACM Digital Library

SESSION: Paper Session 1 - Privacy and Diffusion in Social Media

Assessing User Privacy on Social Media: The Twitter Case Study

Giovanni Livraga
Alessandro Motta
Marco Viviani

At the time of writing, nearly four billion people worldwide employ social media platforms such as Facebook, Instagram, WeChat, TikTok, etc. to share content of various kinds, which may also include personal data. In addition to this, users interact with members of the virtual community, leaving behind important behavioral traces. In most cases, people do not have a full understanding of who will be able to access and use such a body of information, and for what purposes. Although social platforms provide users with some tools to protect their privacy, the very nature of these technologies and the psychological characteristics of users often lead them to ignore such solutions.

To address this issue, in this paper we aim to propose a model for assessing the privacy of users on social media by identifying the critical aspects associated with their content and interactions generated on such platforms. This model, in particular, considers distinct features, of different kinds, that capture the level of users’ exposure with respect to privacy. These features, dropped into a vector space, are used to derive a score that expresses, in a measurable way, the privacy risk of users compared to the information available on social media about them. The proposed model is instantiated and tested on data collected from the microblogging platform Twitter, on which the results of the experimental evaluation are analyzed. Specifically, the model is tested by considering both a binary scenario, i.e., where users’ privacy is evaluated as at risk or not, a multi-class scenario, i.e., where their privacy is evaluated against different risk ranges, and a ranking scenario, i.e., where the users are ranked according to their privacy assessment.

From Sentiment to Sensitivity: The Role of Emotions on Privacy Exposure in Twitter

Lindrit Kqiku
Marvin Kühn
Delphine Reinhardt

Online Social Networks (OSNs) are a vital part of users’ daily lives. Users share content in OSNs increasingly more and in various emotional states. In this work, we explore the role of emotions on privacy exposure and we integrate it as an additional learning parameter in tweet sensitivity recognition. To this end, we first use BERT based classification techniques to recognize six basic emotions in tweets. Using our trained sentiment model, we further perform sentiment inference on a sensitivity dataset and integrate the sentiment in the BERT classification model to classify the tweets according to their sensitivity. We then compare the standard sensitivity recognition models’ results (with their tweets only) against the extended model that integrates the sentiment features in sensitivity recognition. We demonstrate that by including sentiment features in sensitivity analysis, our approach leads to about a 3% increase of f-1 score in contrast to using our base sensitivity classification, i.e., from 83.96% to 87.01% f-1 score. We further demonstrate a correlation between anger and disgust emotions with sensitive tweets, as well as, joy and surprise with non-sensitive tweets.

Not Only Degree Matters: Diffusion-Driven Role Recognition

Susanna Pozzoli
Sarunas Girdzijauskas

Graphs are a data structure that lends itself to representing a wide range of entities connected by relationships. Insights into such entities are learned by graph clustering models that group nodes by either communities or roles. While community detection methods divide vertices into clusters with more significant internal than external connectivity, role discovery algorithms divide nodes by maximizing the similarity in the connectivity structure. Even though both are clusters of vertices, communities and roles excel at different tasks, such as link prediction and anomaly detection, respectively.

Many role discovery algorithms explicitly or implicitly regard the degree as the most discriminating node feature. Methods that depend on how many neighbors a node has work very well for graphs in which the intra-role patterns of connectivity are equivalent. However, in this research paper, we show that structurally similar nodes with different degrees can be mislabeled by existing models since the connectivity structure is similar yet not equivalent.

To address this, we present Diffusion-Driven Role Recognition (D2-R2), an unsupervised learning model designed to account for structurally similar nodes differing in degree, which is important for, e.g., social networks. Firstly, we compute a diffusion matrix in such a way as to explore the neighborhoods of the vertices without emphasizing differences in degree. From this, we extract the diffusion patterns that summarize the connectivity structure of the nodes. Then, we compute the distance between them via Dynamic Time Warping (DTW) and assign a given number of roles by running k-means. Tests on both synthetic graphs and non-synthetic networks show that D2-R2 outperforms methods such as RolX, struc2vec, and GraphWave by up to 21.2% in accuracy and 35.3% in F1 score for graphs in which there are differences in degree between structurally similar nodes.

SESSION: Paper Session 2 - Sentiment Analysis and Accessibility

A study on Accessibility of Google ReCAPTCHA Systems

Ombretta Gaggi

Web sites and social media should be developed keeping in mind the widest range of users, regardless of their abilities or disabilities. Instead they often use CAPTCHAs to prevent robotic access to data even if they base their tests on the use of sight, since the ability to see an image or a text and recognize or understand its content is a way to make out a human being from a bot. This study shows that Google reCAPTCHA v2 discriminates against users with visual impairments, while reCAPTCHAv3 doesn’t and, for this reason, it is the best available solution nowadays from an accessibility point of view.

Endorsement Analysis of Migrant-related Deliberations on YouTube: Prior to and During 2022 Ukrainian crisis

Aparup Khatua
Wolfgang Nejdl

Extant literature has noted that migrant-related deliberations on social media platforms are primarily associated with negative sentiments. However, the literature has rarely probed – whether these negative sentiments get endorsed by other users? If yes, does it depend on who the migrants are – especially if they are cultural others? The 2022 Ukrainian refugee crisis allows probing these intricate issues. We have analyzed 110,803 (prior to this 2022 crisis) and 21,453 (during this crisis) migrant-related comments on the YouTube platform. Specifically, we investigate the relationship between user endorsement and sentiments of these comments. Both datasets indicate that users endorse comments with positive sentiments and reveal a negative propensity to endorse hate speeches, i.e., comments that use swear words. However, the analysis of the recent dataset reveals a negative propensity to endorse comments with negative sentiments, but the earlier dataset indicates a positive propensity. Thus, the endorsement pattern of comments with negative sentiments may depend on who the migrants are!

COVID-19 Vaccine Brand Sentiment on Twitter

Alina Campan
Traian Marius Truta
Shawn Huesman
Vamsi Meda
Jake Anderson

Online social networks (OSNs) are today a primary way to spread and consume information. Maybe the most important aspect of OSNs, both an opportunity and a weakness, is that OSNs are open: users can post anything, which leads to proliferation of information with various degrees of truthfulness. This impacts the volume of information, trending topics, and sentiment of users vis-à-vis of these topics. Our goal in this work is to analyze the spreading of information in Twitter, volume-wise and sentiment-wise (positive or negative), for COVID-19 vaccines overall, and for some specific brands. Our analysis was carried on over five 10-day time-windows in 2021, starting from February and until October. We also looked at what were the most popular tweets we collected during our predefined time-windows, and, by looking at the retweets counts, we observed how they trended over time.