HT '22: Proceedings of the 33rd ACM Conference on Hypertext and Social Media

Full Citation in the ACM Digital Library

SESSION: Social web content, language and networks

Kronecker Decomposition for Knowledge Graph Embeddings

Caglar Demir
Julian Lienen
Axel-Cyrille Ngonga Ngomo

Knowledge graph embedding research has mainly focused on learning continuous representations of entities and relations tailored towards the link prediction problem. Recent results indicate an ever increasing predictive ability of current approaches on benchmark datasets. However, this effectiveness often comes with the cost of over-parameterization and increased computationally complexity. The former induces extensive hyperparameter optimization to mitigate malicious overfitting. The latter magnifies the importance of winning the hardware lottery. Here, we investigate a remedy for the first problem. We propose a technique based on Kronecker decomposition to reduce the number of parameters in a knowledge graph embedding model, while retaining its expressiveness. Through Kronecker decomposition, large embedding matrices are split into smaller embedding matrices during the training process. Hence, embeddings of knowledge graphs are not plainly retrieved but reconstructed on the fly. The decomposition ensures that elementwise interactions between three embedding vectors are extended with interactions within each embedding vector. This implicitly reduces redundancy in embedding vectors and encourages feature reuse. To quantify the impact of applying Kronecker decomposition on embedding matrices, we conduct a series of experiments on benchmark datasets. Our experiments suggest that applying Kronecker decomposition on embedding matrices leads to an improved parameter efficiency on all benchmark datasets. Moreover, empirical evidence suggests that reconstructed embeddings entail robustness against noise in the input knowledge graph. To foster reproducible research, we provide an open-source implementation of our approach, including training and evaluation scripts as well as pre-trained models.1

Towards Proactively Forecasting Sentence-Specific Information Popularity within Online News Documents

Sayar Ghosh Roy
Anshul Padhi
Risubh Jain
Manish Gupta
Vasudeva Varma

Multiple studies have focused on predicting the prospective popularity of an online document as a whole, without paying attention to the contributions of its individual parts. We introduce the task of proactively forecasting popularities of sentences within online news documents solely utilizing their natural language content. We model sentence-specific popularity forecasting as a sequence regression task. For training our models, we curate InfoPop, the first dataset containing popularity labels for over 1.7 million sentences from over 50,000 online news documents. To the best of our knowledge, this is the first dataset automatically created using streams of incoming search engine queries to generate sentence-level popularity annotations. We propose a novel transfer learning approach involving sentence salience prediction as an auxiliary task. Our proposed technique coupled with a BERT-based neural model exceeds nDCG values of 0.8 for proactive sentence-specific popularity forecasting. Notably, our study presents a non-trivial takeaway: though popularity and salience are different concepts, transfer learning from salience prediction enhances popularity forecasting. We release InfoPop and make our code publicly available1.

Cross-Lingual Query-Based Summarization of Crisis-Related Social Media: An Abstractive Approach Using Transformers

Fedor Vitiugin
Carlos Castillo

Relevant and timely information collected from social media during crises can be an invaluable resource for emergency management. However, extracting this information remains a challenging task, particularly when dealing with social media postings in multiple languages. This work proposes a cross-lingual method for retrieving and summarizing crisis-relevant information from social media postings. We describe a uniform way of expressing various information needs through structured queries and a way of creating summaries answering those information needs. The method is based on multilingual transformers embeddings. Queries are written in one of the languages supported by the embeddings, and the extracted sentences can be in any of the other languages supported. Abstractive summaries are created by transformers. The evaluation, done by crowdsourcing evaluators and emergency management experts, and carried out on collections extracted from Twitter during five large-scale disasters spanning ten languages, shows the flexibility of our approach. The generated summaries are regarded as more focused, structured, and coherent than existing state-of-the-art methods, and experts compare them favorably against summaries created by existing, state-of-the-art methods.

Data Bootstrapping Approaches to Improve Low Resource Abusive Language Detection for Indic Languages

Mithun Das
Somnath Banerjee
Animesh Mukherjee

Abusive language is a growing concern in many social media platforms. Repeated exposure to abusive speech has created physiological effects on the target users. Thus, the problem of abusive language should be addressed in all forms for online peace and safety. While extensive research exists in abusive speech detection, most studies focus on English. Recently, many smearing incidents have occurred in India, which provoked diverse forms of abusive speech in online space in various languages based on the geographic location. Therefore it is essential to deal with such malicious content. In this paper, to bridge the gap, we demonstrate a large-scale analysis of multilingual abusive speech in Indic languages. We examine different interlingual transfer mechanisms and observe the performance of various multilingual models for abusive speech detection for eight different Indic languages. We also experiment to show how robust these models are on adversarial attacks. Finally, we conduct an in-depth error analysis by looking into the models’ misclassified posts across various settings. We have made our code and models public for other researchers1.

ADAGIO — Automated Data Augmentation of Knowledge Graphs Using Multi-expression Learning

Kevin Dressler
Mohamed Ahmed Sherif
Axel-Cyrille Ngonga Ngomo

The creation of an RDF knowledge graph for a particular application commonly involves a pipeline of tools that transform a set of input data sources into an RDF knowledge graph in a process called dataset augmentation. The components of such augmentation pipelines often require extensive configuration to lead to satisfactory results. Thus, non-experts are often unable to use them. We present an efficient supervised algorithm based on genetic programming for learning knowledge graph augmentation pipelines of arbitrary length. Our approach uses multi-expression learning to learn augmentation pipelines able to achieve a high F-measure on the training data. Our evaluation suggests that our approach can efficiently learn a larger class of RDF dataset augmentation tasks than the state of the art while using only a single training example. Even on the most complex augmentation problem we posed, our approach consistently achieves an average F1-measure of 99% in under 500 iterations with an average runtime of 16 seconds.

Characterizing Sponsored Content in Facebook and Instagram

Emanuelle Azevedo Martins
Isadora Salles
Fabricio Benevenuto
Olga Goussevskaia

In this work we present a comparative analysis of influencer marketing evolution on Facebook and Instagram, spanning the pre and post Covid-19 pandemic onset periods. We collected and characterized a large-scale cross-platform dataset, comprised of 9.5 million sponsored posts. We analyzed the relative growth rates of the number of ads and of user engagement within different topics of interest, such as sports, retail, travel, and politics. We discuss which topics have been most impacted by the onset of the pandemic, both in terms of sponsored content supply and demand. With this work we hope to expand the understanding of influence dynamics on social networks and provide support for the development of more contextualized and effective branding strategies.

SpaceE: Knowledge Graph Embedding by Relational Linear Transformation in the Entity Space

Jinxing Yu
Yunfeng Cai
Mingming Sun
Ping Li

Translation distance based knowledge graph embedding (KGE) methods, such as TransE and RotatE, model the relation in knowledge graphs as translation or rotation in the vector space. Both translation and rotation are injective; that is, the translation or rotation of different vectors results in different results. In knowledge graphs, different entities may have a relation with the same entity; for example, many actors starred in one movie. Such a non-injective relation pattern cannot be well modeled by the translation or rotation operations in existing translation distance based KGE methods. To tackle the challenge, we propose a translation distance-based KGE method called SpaceE to model relations as linear transformations. The proposed SpaceE embeds both entities and relations in knowledge graphs as matrices and SpaceE naturally models non-injective relations with singular linear transformations. We theoretically demonstrate that SpaceE is a fully expressive model with the ability to infer multiple desired relation patterns, including symmetry, skew-symmetry, inversion, Abelian composition, and non-Abelian composition. Experimental results on link prediction datasets illustrate that SpaceE substantially outperforms many previous translation distance based knowledge graph embedding methods, especially on datasets with many non-injective relations. The code is available based on the PaddlePaddle deep learning platform https://www.paddlepaddle.org.cn/.

SESSION: Digital humanities, culture and society

Links Of Darkness: Hypertext And Horror

Mark Bernstein
Stee McMorris

Category fiction adopts a formal narrative framework to explore topics of mutual interest to readers and writers. Originating as a means of assisting retail booksellers and movie theaters in their work of matching readers and writers, categories like “Mystery”, “Western”, and “Horror” have shaped modern storytelling. The frameworks that underlie category fiction are often confounded with their conventional surface characteristics. For example, mysteries are not puzzles, but rather interrogate how a damaged world can be understood and, with understanding, repaired. We observe that that framework of Horror is congruent to the affordances of literary hypertext. The technologies and trappings of hypertext itself share the slippery uncanniness and unheimlichkeit of other horror staples: mirrors, twins, rivers, and crossroads. Finally, it is intriguing that the history of hypertext and the World Wide Web itself falls neatly into the framework of horror.

Characterizing Vaccination Movements on YouTube in the United States and Brazil

Marcelo Sartori Locatelli
Josemar Caetano
Wagner Meira Jr.
Virgilio Almeida

In the context of COVID-19 pandemic, social networks such as Facebook, Twitter, YouTube and Instagram stand out as important sources of information. Among those, YouTube, as the largest and most engaging online media consumption platform, has a large influence in the spread of information and misinformation, which makes it important to study how the platform deals with the problems that arise from disinformation, as well as how its users interact with different types of content. Considering that United States (USA) and Brazil (BR) are two countries with the highest COVID-19 death tolls, we asked the following question: What are the nuances of vaccination campaigns in the two countries? With that in mind, we engage in a comparative analysis of pro and anti-vaccine movements on YouTube. We also investigate the role of YouTube in countering online vaccine misinformation in USA and BR. For this means, we monitored the removal of vaccine related content on the platform and also applied various techniques to analyze the differences in discourse and engagement in pro and anti-vaccine ”comment sections”. We found that American anti-vaccine content tend to lead to considerably more toxic and negative discussion than their pro-vaccine counterparts while also leading to 18% higher user-user engagement, while Brazilian anti-vaccine content was significantly less engaging. We also found that pro-vaccine and anti-vaccine discourses are considerably different as the former is associated with conspiracy theories (e.g. ccp), misinformation and alternative medicine (e.g. hydroxychloroquine), while the latter is associated with protective measures. Finally, it was observed that YouTube content removals are still insufficient, with only approximately 16% of the anti-vaccine content being removed by the end of the studied period, with the United States registering the highest percentage of removed anti-vaccine content(34%) and Brazil registering the lowest(9.8%).

Is there an Author in this Labyrinth?: Hypertext Fiction and Farrell's Textual Fallacy

Sam Brooker

In 2017 Professor of Literature John Farrell published The Varieties of Authorial Intention. Joining other dissenting voices past and present, this work addressed what the author considered a key tenet of mid- to late 20th century literary criticism: that reference to authorial intention is out of bounds, literary works being constituted by the text alone.

Hypertext fiction has its own complex relationship with the notion of intention. From earlier entanglement in post-structuralist approaches to network textuality and the potential for readers to evade authors via branching narratives, hypertext fiction emerged as a distinctive form of textuality that can express intention in unique and unexpected ways.

How effectively do the three modes of authorial intention Farrell identifies - communicative, artistic, practical – map to hypertext fiction both past and future? Can this model – devised in the context of linear print writing – accommodate the unique form of textuality represented by hypertext, with its own affordances and opportunities to express intent?

Hypertext’s meta-history: Documenting in-conference citations, authors and keyword data, 1987-2021

Mark W. R. Anderson
David Millard

Conferences such as ACM Hypertext have been running for many decades and the metadata on their collected publications represent a valuable scholarly meta-history on areas such as the community’s health, diversity, and changing interests. But the metadata about these papers is not readily available for analysis, and the data collection and cleaning tasks appear substantial. In this paper we attempt to explore this challenge using the ACM Hypertext series as a case study. Taking the ACM Digital Library as a starting point, and using a combination of manual and automatic methods, we have constructed and released a 3-star Open Dataset representing over 1000 publications by almost 2,500 authors. An initial analysis reveals a modestly-sized but robust conference, with a changing pattern of in-citations that co-occurs with the arrival of social media, and a relatively consistent but imbalanced gender ratio of authors that shows some signs of recent improvements. The challenges encountered included identifying discrete author names, potential issues with text retrieval from PDF, and a disparate set of author keywords that reveals an absence of a common vocabulary. These insights are the results of a hard-fought process that is made complex by an incomplete digital record and a lack of consistency in naming. This Hypertext case study thus reveals a serious shortfall in the way that scholarly activity is captured and described, and questions PDF as the primary method of recording publications. Addressing these issues would make further analysis more straightforward and would allow larger events (with orders of magnitude more data) to be analysed in a similar way.

The Impact of Non-Verbalization in Think-Aloud: Understanding Knowledge Gain Indicators Considering Think-Aloud Web Searches

Marcelo Tibau
Sean Wolfgand Matsui Siqueira
Bernardo Pereira Nunes

Web searching and knowledge gain are intertwined processes that share mental and physical activities at the core of both human cognition and hypertext theory, such as identifying, comparing, linking, and combining different subsets of existing or new information. As a consequence of the improvement of our ability to retrieve information across multiple sources provided by Web search engines, the necessity to understand how a user’s knowledge evolves through a Web search session increased. Previous works focused on understanding the knowledge gained in Web searches by using think-aloud protocols. From the user’s verbalization of her searching procedures, it is possible to identify her cognitive processing. Notwithstanding, we argue that user’s searching and browsing behaviors should be analyzed not only through the verbalization periods, as usually accepted by think-aloud studies, since not all cognitive decisions are made consciously, some are unconscious or subconscious. Hence, it is possible to identify more knowledge gained than it would be attainable focusing solely on what was verbalized. In this sense, we evaluated the statistical significance level derived from the relationship between verbal and non-verbal search periods mapped from online information searching strategy indicators. Then, we identified a positive association regarding non-verbalization and some indicators related to knowledge gain concepts and discovered that the values of non-verbal periods tend to increase as the values of particular indicators related to knowledge gain also increase. The knowledge gain concepts were identified using constructs representing cognitive absorption, comprehension, elaboration, and memory. Concerning the impact of Think-Aloud on knowledge gain processes, we found out that verbalization does affect how participants handle their search tasks. However, our result also showed a predominance of non-verbal periods during metacognitive-based searching activities, which may indicate that Think-Aloud protocols should not only rely on verbalization for indication of knowledge gain. Although verbalization may not disrupt the thought process, it might cut in on the cognitive process as the participant tries to explain her action while performing it. A search engine could use the identified indicators to account for the knowledge gained during search sessions, which would make it more adapted to identify user information needs and promote personalized information-adding.

Learning to Adapt Domain Shifts of Moral Values via Instance Weighting

Xiaolei Huang
Alexandra Wormley
Adam Cohen

Classifying moral values in user-generated text from social media is critical in understanding community cultures and interpreting user behaviors of social movements. Moral values and language usage can change across the social movements; however, text classifiers are usually trained in source domains of existing social movements and tested in target domains of new social issues without considering the variations. In this study, we examine domain shifts of moral values and language usage, quantify the effects of domain shifts on the morality classification task, and propose a neural adaptation framework via instance weighting to improve cross-domain classification tasks. The quantification analysis suggests a strong correlation between morality shifts, language usage, and classification performance. We evaluate the neural adaptation framework on a public Twitter data across 7 social movements and gain classification improvements up to 12.1%. Finally, we release a new data of the COVID-19 vaccine labeled with moral values and evaluate our approach on the new target domain. For the case study of the COVID-19 vaccine, our adaptation framework achieves up to 5.26% improvements over neural baselines. This is the first study to quantify impacts of moral shifts, propose adaptive framework to model the shifts, and conduct a case study to model COVID-19 vaccine-related behaviors from moral values.

SESSION: Information exploration and visualisation

The Effects of Spatial Visualization versus Ranked Lists on Quality, Time Efficiency, and Interaction

Daniel Roßner
Claus Atzenbeck
Tom Gross

Hypertext systems support users in navigating structured data sets and to find relevant information. Various interaction and visualization concepts aim to give users better insight into the data set, by suggesting queries and visualizing elements of interest in a meaningful way. Ranked lists are very common to show some sort of priority, while spatial layouts often help users to trace relations in the data. Only little research has been done in user studies that systematically show and reason about the differences of such spatial layouts and ranked lists. In this paper we report on a systematic comparison of a spatial visualization versus a ranked list layout. For this purpose, we did an between-subject study with 43 participants. One group performed a task with a system providing semantic visualization in 2D, the other group performed the same task with a ranked list. Both interfaces are very similar and only differ in how suggestions are visualized. The results show that users of the spatial layout finished their task in shorter time and have a tendency towards higher satisfaction. At the same time, they had more interactions with the system. Furthermore we discuss some in-depth data of the test sessions, which show that the visualization influences the users’ behavior.

Enabling Convenient Online Collaborative Writing for Low Vision Screen Magnifier Users

Hae-Na Lee
Yash Prakash
Mohan Sunkara
I.V. Ramakrishnan
Vikas Ashok

Online collaborative editors have become increasingly prevalent in both professional and academic settings. However, little is known about how usable these editors are for low vision screen magnifier users, as existing research works have predominantly focused on blind screen reader users. An interview study revealed that it is arduous and frustrating for screen magnifier users to perform even the basic collaborative writing activities, such as addressing collaborators’ comments and reviewing document changes. Specific interaction challenges underlying these issues included excessive panning, content occlusion, large empty space patches, and frequent loss of context. To address these challenges, we developed MagDocs, a browser extension that assists screen magnifier users in conveniently performing collaborative writing activities on the Google Docs web application. MagDocs is rooted in two ideas: (i) a custom support interface that users can instantly access on demand and interact with collaborative interface elements, such as comments or collaborator edits, within the current magnifier viewport; and (ii) visual relationship preservation, where collaborative elements and the corresponding text in the document are shown close to each other within the magnifier viewport to minimize context loss and panning effort. A study with 15 low vision users showed that MagDocs significantly improved the overall user satisfaction and interaction experience, while also substantially reduced the time and effort to perform typical collaborative writing tasks.

Exploring the Feasibility of Crowd-Powered Decomposition of Complex User Questions in Text-to-SQL Tasks

Sara Salimzadeh
Ujwal Gadiraju
Claudia Hauff
Arie van Deursen

Natural Language Interfaces to Databases (NLIDB), also known as Text-to-SQL models, enable users with different levels of knowledge in Structured Query Language (SQL) to access relational databases without any programming effort. By translating natural languages into SQL query, not only do NLIDBs minimize the burden of memorizing the schema of databases and writing complex SQL queries, but they also allow non-experts to acquire information from databases in natural languages. However, existing NLIDBs largely fail to translate natural languages to SQL when they are complex, preventing them from being deployed in real-world scenarios and generalizing across unseen complex databases. In this paper, we explored the feasibility of decomposing complex user questions into multiple sub-questions — each with a reduced complexity — as a means to circumvent the problem of complex SQL generation. We investigated the feasibility of decomposing complex user questions in a manner that each sub-question is simple enough for existing NLIDBs to generate correct SQL queries, using non-expert crowd workers in juxtaposition with SQL experts. Through an empirical study on an NLIDB benchmark dataset, we found that crowd-powered decomposition of complex user questions led to an accuracy boost of an existing Text-to-SQL pipeline from 30% to 59% (96% accuracy boost). Similarly, decomposition by SQL experts resulted in boosting the accuracy to 76% (153% accuracy boost). Our findings suggest that crowd-powered decomposition can be a scalable alternative to producing the training data necessary to build machine learning models that can automatically decompose complex user questions, thereby improving Text-to-SQL pipelines.

The Magic of Carousels: Single vs. Multi-List Recommender Systems

Behnam Rahdari
Branislav Kveton
Peter Brusilovsky

Carousel-based interfaces with multiple topic-focused item lists have emerged as a de-facto standard for presenting recommendation results to end-users in real-life recommender systems. In this paper, we attempt to formalize and explain the “magic” power of carousel-based interfaces from a traditional hypertext prospect of navigability. By applying both, formal analysis and a data-driven evaluation, we demonstrate and measure the benefits offered by the carousel-based organization of recommendations. We hope that this work will benefit the researchers in both hypertext and recommender systems communities, where the research on carousel-based interfaces is gaining popularity.

SESSION: Personalized Recommender Systems

The Effect of Recommendation Source and Justification on Professional Development Recommendations for High School Teachers

Lijie Guo
Christopher Flathmann
Reza Anaraky
Nathan McNeese
Bart Knijnenburg

This paper describes a study conducted in the process of building a recommender system that provides personalized professional development pathways for high school teachers seeking to increase their disciplinary knowledge and/or their teaching skills. A controlled experiment (N = 190) was conducted to study the effects of the presented justification for the recommendations (teachers’ needs vs. their interests) and the presented source of the recommendations (a human expert vs. an AI algorithm) on users’ perceptions of and experience with the system. Our results show an interaction effect between these two system aspects: users who are told that the recommendations are based on their interests have a better experience when the recommendations are presented as originating from an AI algorithm, while users who are told that the recommendations are based on their needs have a better experience when the recommendations are presented as originating from a human expert.

SESSION: Late Breaking Results

Erasing Labor with Labor: Dark Patterns and Lockstep Behaviors on Google Play

Ashwin Singh
Arvindh Arun
Pulak Malhotra
Pooja Desur
Ayushi Jain
Duen Horng Chau
Ponnurangam Kumaraguru

Google Play’s policy forbids the use of incentivized installs, ratings, and reviews to manipulate the placement of apps. However, there still exist apps that incentivize installs for other apps on the platform. To understand how install-incentivizing apps affect users, we examine their ecosystem through a socio-technical lens and perform a mixed-methods analysis of their reviews and permissions. Our dataset contains 319K reviews collected daily over five months from 60 such apps that cumulatively account for over 160.5M installs. We perform qualitative analysis of reviews to reveal various types of dark patterns that developers incorporate in install-incentivizing apps, highlighting their normative concerns at both user and platform levels. Permissions requested by these apps validate our discovery of dark patterns, with over 92% apps accessing sensitive user information. We find evidence of fraudulent reviews on install-incentivizing apps, following which we model them as an edge stream in a dynamic bipartite graph of apps and reviewers. Our proposed reconfiguration of a state-of-the-art microcluster anomaly detection algorithm yields promising preliminary results in detecting this fraud. We discover highly significant lockstep behaviors exhibited by reviews that aim to boost the overall rating of an install-incentivizing app. Upon evaluating the 50 most suspicious clusters of boosting reviews detected by the algorithm, we find (i) near-identical pairs of reviews across 94% (47 clusters), and (ii) over 35% (1,687 of 4,717 reviews) present in the same form near-identical pairs within their cluster. Finally, we conclude with a discussion on how fraud is intertwined with labor and poses a threat to the trust and transparency of Google Play.

Exploring Semantically Interlaced Cultural Heritage Narratives

Noemi Mauro
Angelo Geninatti Cossatin
Ester Cravero
Liliana Ardissono
Guido Magnano
Marco Giardino

While traditional mobile guides propose itineraries underlying the presentation of individual narrations, a broad view of Cultural Heritage should take into account that Points of Interests, historical characters and objects are frequently related in different stories linking art, history and science. Moreover, stories could be associated through their common themes. Thus, a focus on individual narrations is not enough to provide users with a holistic view of the places they visit.

In this paper, we investigate the presentation of interlaced Cultural Heritage information to make users aware about the connections among such stories. For this purpose, we propose an exploration model that enables the user to take side walks in semantically-related narrations concerning Points of Interest. This is based on a semantic knowledge representation where two types of relations connect entities within individual stories, and stories through their common themes. Based on this representation, we developed the Triangolazioni mobile guide that presents multimedia information about Cultural Heritage in Torino city. A user study has shown that participants perceived the app, and its ”side walking” support, as highly usable. Moreover, they appreciated the storytelling capabilities of the app.

Identifying neutral reviews from unlabeled data: An exploratory study on user ratings and word-level polarity scores

Salim Sazzed

The presence of the reviews containing mixed or contrasting opinions, also known as neutral reviews, is prevalent in user feedback data. By leveraging annotated data, supervised machine learning (ML) classifiers can learn implicit patterns to identify these neutral reviews. However, labeled data are barely available in most circumstances. When annotated data are unavailable, unsupervised approaches such as lexicon-based methods are employed that utilize word-level polarity scores with a set of rules. As a preliminary study for developing a sophisticated unsupervised framework for recognizing neutral reviews, here, we scrutinize the performances of the existing lexicon-based methods. When applied to four multi-domain review datasets, we observe that all of them perform poorly for identifying neutral reviews. We manually inspect the semantic attributes of a subset of neutral reviews classified wrong by these lexicon-based methods. The experimental results and manual analysis reveal that determining neutrality utilizing the lexical rule-based methods is often ineffective due to numerous reasons, such as user preferences on certain aspects, coverage of the sentiment lexicon, irregularly in the efficacy of aggregation rules, and the context-sensitive polarity of words. As a preliminary study, this analysis reveals traits of neutral reviews and limitations of existing approaches and provides insights to develop methods for neutral review identification from the unlabeled data.

Impact of Exogenous Biases of Instagram Posts on Park Visitation Estimation

Afra Mashhadi
Sana Suse
Susan Ammiri
Spencer Wood

Recent years have seen an increase in the use of social media for various decision-making purposes in the context of urban computing and smart cities, including management of public parks. However, as use of readily available social media becomes more mainstream, a critical concern that arises is the extent to which such data remains a valid proxy for people’s online and offline behavior over time. Existing literature has mostly concentrated on the endogenous elements of the biases of social media data corresponding to platform popularity across different demographics, but failed to address the exogenous factors. In this article, we conduct a longitudinal study of park visitors and the impact of pandemic on park visitation in four US metropolitan areas. By leveraging data from Instagram and SafeGraph, we show the consequences of not accounting for both endogenous and exogenous biases that exists in approaches that rely on social media to estimate park visitation.

Revealing the Demographic Attributes of the Authors from the Abstracts of Scientific Articles

Salim Sazzed

This study presents multiple strategies to automatically reveal undisclosed demographic attributes of the authors in the double-blind submissions. From a limited amount of textual content of around 100-200 words excerpted from an abstract, this study aims to reveal the following pieces of information, i) the English language nativeness of the primary author, ii) the country of origin of the primary author, and iii) the gender of the primary author. We introduce an annotated dataset of over 5600 articles labeled with the native language, country of origin, and gender information of the primary authors. We employ classical machine learning (CML) algorithms with statistical n-gram features and transformer-based fine-tuned language models to determine various demographic attributes. We observe that transformer-based models yield slightly better performances for all three tasks. The transformer-based models achieve macro F1 scores close to 75% for identifying the English language nativeness of the primary authors. To determine the country of the non-native English authors, the fine-tuned transformer-based models obtain F1 scores of around 60% (10-class classification). For the gender prediction task, we attain F1 scores of 0.65 by the transformer-based models. The experimental results demonstrate that the fine-tuned language models and CML classifiers are capable of disclosing various author attributes with an acceptable level of accuracy that can undermine the blindness of the double-blind submission.

Rites de Passage: Elucidating Displacement to Emplacement of Refugees on Twitter

Aparup Khatua
Wolfgang Nejdl

Prior refugee-related studies have primarily examined social media deliberations to probe societal opinions around a specific refugee event. Contrarily, our study attempts to identify the various stages of their journey from displacement to emplacement in the host nation. We draw insights from Gennep's seminal work, i.e., Les Rites de Passage, to identify four phases of the refugee journey: Arrival of Refugees, Temporal stay at Asylums, Rehabilitation, and Integration of Refugees into the host nation. To test our proposed framework, we have collected multimodal tweets from April 15, 2020, to March 15, 2021. A fusion of BERT+LSTM (for text inputs) and InceptionV4 (for image inputs) has reported an F1-score of 80.93%. Subsequently, to test the practical implication of our proposed model in real-time, we have considered the multimodal tweets related to the 2022 Ukrainian refugee crisis. An F1-score of 71.88 % for this 2022 crisis confirms the generalizability of our proposed framework.

Understanding Effects of Moderation and Migration on Online Video Sharing Platforms

Gabriel Luis Santos Freire
Tales Panoutsos
Lucas Perez
Fabricio Benevenuto
Flavio Figueiredo

To mitigate the propagation of potentially dangerous information (e.g., fake news), social media platforms usually rely on the deletion or censoring of content (here called moderation). In this research, we measure how content moderation on YouTube affects a channel’s popularity. To achieve our goal, we gather information on videos that were deleted from YouTube using the altCensored platform. We cross-section this data with channel popularity time series from SocialBlade. After characterizing this novel dataset, we employ Regression Discontinuity Design (RDD) to effectively measure impact. Using RDD we categorize the impact of censorship on deletion in four different patterns: (PP) channels with positive regression slopes (e.g., indicating growth) both before and after deletion; (PN) channels with positive growth before deletion and negative after, capturing a positive-negative relation, or a deletion that inverts the trend, (NP) negative-positive, those which were decreasing in growth with a change in trend after deletion (NN) as well as negative to negative. These groups represent 16% (PP), 26% (PN), 16% (NP), and 42% (NN) of our moderated videos. As a final result, we also show that videos may yet be found on other websites. The large amounts of events in (PP) and (NP), as well as the fact that videos are still available on the Web, indicate that moderation may not be as effective as it seems.

SESSION: Demos

EarlyAd: A System for Real-Time Surveillance of Brazilian Early Electoral Ads on Twitter

Marcelo M. R. Araújo
Samuel Guimarães
Marcio Silva
Josemar Caetano
Jonatas Santos
Julio C. S. Reis
Ana P. C. Silva
Fabricio Benevenuto
Jussara M. Almeida

The sheer volume of social media data produced daily brings challenges to the surveillance and detection of specific actions of interest, notably actions associated with infringement of regulations or laws in force in specific countries. One such case is the sharing of early electoral advertisement (ads), which is prohibited by law during political elections in Brazil. In this paper, we introduce EarlyAd, a system that performs, in real time, the collection, identification and analysis of early electoral advertisements on Twitter. Our tool was designed to bring more transparency and awareness to the Brazilian society about such practice, uncovering common patterns associated with this kind of content, while also offering competent authorities evidence to facilitate law enforcement. Our tool is running online and it is available at: http://earlyad.dcc.ufmg.br.

Telegram Monitor: Monitoring Brazilian Political Groups and Channels on Telegram

Manoel Júnior
Philipe Melo
Daniel Kansaon
Vitor Mafra
Kaio Sa
Fabricio Benevenuto

In this work, we present the “Telegram Monitor”, a web-based system that monitors the political debate in this environment and enables the analysis of the most shared content in multiple channels and public groups. Our system aims to allow journalists, researchers, and fact-checking agencies to identify trending conspiracy theories, misinformation campaigns, or simply to monitor the political debate in this space along the 2022 Brazilian elections. We hope our system can assist the combat of misinformation spreading through Telegram in Brazil. The following link contains a brief description about the aforementioned system: https://bit.ly/3l4xNrF

SESSION: Blue Sky Ideas

Emotional Closeness by Means of Intelligent Thoughts and Memory Spaces

Claus Atzenbeck
Mark Bernstein
Sarah Diefenbach

This blue sky paper envisions a novel system which promotes emotional closeness through storytelling. Family members, who may be separated, collaboratively build a spatial hypertext of images and text fragments to express and structure their thoughts and memories. The system observes their reactions as well as their media while they work. Live recommendations prompt users in their thinking and storytelling. Family stories are thus collaboratively adapted to more tightly connect the thoughts and emotions of their loved ones.

From Users to (Sense)Makers: On the Pivotal Role of Stigmergic Social Annotation in the Quest for Collective Sensemaking

Ronen Tamari
Daniel Friedman
William Fischer
Lauren Hebert
Dafna Shahaf

The web has become a dominant epistemic environment, influencing people’s beliefs at a global scale. However, online epistemic environments are increasingly polluted, impairing societies’ ability to coordinate effectively in the face of global crises. We argue that centralized platforms are a main source of epistemic pollution, and that healthier environments require redesigning how we collectively govern attention. Inspired by decentralization and open source software movements, we propose Open Source Attention, a socio-technical framework for “freeing” human attention from control by platforms, through a decentralized eco-system for creating, storing and querying stigmergic markers; the digital traces of human attention.

Hyperownership: Beyond the Current State of Interaction with Digital Property

Amaury Trujillo

The introduction of novel technology has oftentimes changed the concept of ownership. Non-fungible tokens are a recent example, as they allow a decentralized way to generate and verify proof of ownership via distributed ledger technology. Despite crucial uncertainties, these tokens have generated great enthusiasm for the future of digital property and its surrounding economy. In this regard, I think there is an untapped opportunity in applying a hypertext approach to augment such highly structured ownership-based associations. To this end, in this work I propose hyperownership, based on the premises that property is the law of lists and ledgers, and that hypertext is an apt method to inquiry such a ledger system. In spite of the significant risks and challenges to realize such a vision, I believe that it has great potential to transform the way with which we interact with digital property.

More Comfortable With Chaos: Using Hypertext to Shatter Echo Chambers and Promote Creativity

Dana McKay
Stephann Makri
George Buchanan

Chaos has considerable negative associations: it is perceived as frightening and destructive and best avoided. Many constructs on the internet are a reaction to an information deluge that is perceived to be chaotic and overwhelming. Search engines, with their consistent quest for the one right answer are one such construct. Filter bubbles and echo chambers, which protect us from the chaos of others’ opinions are another. While too much chaos is frightening and overwhelming, a little chaos can be useful. One way in which this occurs is through potentially preventing polarisation by exposing people to new views and ideas (which may even change their minds) on social media. Another way chaos can be useful is by facilitating discovery both in terms of creative thinking processes, which thrive on a wide (chaotic) variety of information, and in promoting discovery through serendipity. This paper first examines the case for and against chaos, then sets out a chaos-driven research agenda for the hypertext community.

Personalized Interventions for Online Moderation

Stefano Cresci
Amaury Trujillo
Tiziano Fagni

Current online moderation follows a one-size-fits-all approach, where each intervention is applied in the same way to all users. This naïve approach is challenged by established socio-behavioral theories and by recent empirical results that showed the limited effectiveness of such interventions. We propose a paradigm-shift in online moderation by moving towards a personalized and user-centered approach. Our multidisciplinary vision combines state-of-the-art theories and practices in diverse fields such as computer science, sociology and psychology, to design personalized moderation interventions (PMIs). In outlining the path leading to the next-generation of moderation interventions, we also discuss the most prominent challenges introduced by such a disruptive change.

Robust metadata in multiple environments

Frode Hegland

We live in a time of wondrous technologies, but plain old academic articles remain hamstrung by clinging to the affordances of the paper media of the past. The affordances offered by digital media are not present in any depth; there is simply copy and paste and blue hyperlinks. The Visual-Meta approach makes even basic PDF documents contain rich metadata, which allows them to be active rather than passive knowledge components, and this can be done in any media in which the document can be rendered.

This paper presents Visual-Meta and explains what it is, what the basic benefits are, and how it can augment knowledge in augmented environments such as AR/VR, often today referred to as the ‘metaverse’.

The Web At War: Hypertext, Social Media, and Totalitarianism: Hypertext, Social Media, and Totalitarianism

Mark Bernstein

In 2022, much of the world faced the prospect of a prolonged conventional war with a totalitarian state. The origins of hypertext lie in the wars of the 20th Century, and efforts to avoid a repeated conflict — and confidence that conflict could be contained if not entirely avoided— is deeply embedded into the architecture of the World Wide Web. The Web was not designed to confront a war, and it remains deeply vulnerable to totalitarian subversion. Our systems, platforms, and our discipline will need to adapt.

Weaponising Social Media for Information Divide and Warfare

Ehsan-Ul Haq
Gareth Tyson
Tristan Braud
Pan Hui

Social media is often used to disseminate information during crises, including wars, natural disasters and pandemics. This paper discusses the challenges faced during crisis situations, which social media can both contribute to and ameliorate. We discuss the role that information polarisation plays in exacerbating problems. We then discuss how certain mal-actors exploit these divides. We conclude by detailing future avenues of work that can help mitigate these issues.

SESSION: Workshops

5th Workshop on Human Factors in Hypertext (HUMAN ’22)

Claus Atzenbeck
Jessica Rubart

HUMAN ’22 is the fifth workshop of a series for the ACM Hypertext conferences. It has a strong focus on the user and thus is complementary to the strong machine analytics research direction that could be experienced in previous conferences.

The user-centric view on hypertext not only includes user interfaces and interaction, but also discussions about hypertext application domains. Furthermore, the workshop raises the question of how original hypertext ideas (e.g., Doug Engelbart’s “augmenting human intellect” or Frank Halasz’ “hypertext as a medium for thinking and communication”) can improve today’s hypertext systems.

SIDEWAYS-2022 @ HT-2022: 7th International Workshop on Social Media World Sensors

Luigi Di Caro
Claudio Schifanella
Mario Cataldi

This seventh edition of the workshop aims at bringing together academics and practitioners from different areas to promote the vision of social media as social sensors.

Nowadays, Social media platforms represent freely-accessible information networks allowing registered (and unregistered) users to read, share and broadcast messages referring to a potentially-unlimited range of arguments, by also exploiting the immediateness of handy smart devices. This long-running workshop aims at focusing the attention on a particular perspective of these powerful communication channels, which is that of social sensors, where each user reacts in real time to the underlying reality by providing some own interpretation.

Technologies and AI artifacts may support automatic or semi-automatic applications for information detection and integration, offering sideways to the existing authoritative information media and the information reported by the surrounding community.

OASIS’22: 2nd International Workshop on Open Challenges in Online Social Networks

Barbara Guidi
Laura Ricci
Andrea Michienzi

Online Social Networks (OSNs) became part of the everyday life for many people around the world. They are one of the main channel through which information can spread at lightning speed. Thanks to this fact, people use them for the most disparate reasons, such as sources of information in place of newspapers, to receive emotional or technical support, or to share their ideas and opinions to satisfy their need of sociality.

Since their introduction, people questioned these services because they are affected by several problems. These problems include: preservation of the users’ privacy, fake news diffusion, diffusion of illegal pieces of content, censorship vs free speech, economic value redistribution, security vs trust, and so on. The aim of this workshop is to partially trying to overcome these problems by setting up a platform for researchers to publish their contributions.

The contributions can point to: innovative methods and algorithms for social graph mining, which can be helpful to develop more efficient information diffusion techniques; the problem of privacy, and how can be enforced in these system is current, and in particular the relation between security, trust and privacy is crucial in the scenario of OSNs; the decentralisation and its impact on the implementation of social services; how Artificial Intelligence techniques that respect the privacy of the users can be implemented; technologies that enable the metaverse.

NHT’22: Narrative and Hypertext 2022

Charlie Hargood
David Millard

NHT is a continuing workshop series associated with the ACM Hypertext conference. The workshop acts as forum of discussion for the narrative systems community within the wider audience of the Hypertext conference. The workshop includes presentations from authors of accepted short research papers, invited talks, and Q&A and debate to provide a venue for important discussions of challenges and opportunities for members of the narrative and hypertext community. Since 2017 it has had a sister workshop in the form of Authoring for Interactive Storytelling (AIS) which runs at the International Conference on Interactive Digital Storytelling. This year NHT has ‘The Authoring Problem’ as a special theme. The NHT website on this workshop series can be found at http://nht.ecs.soton.ac.uk