In 1922, the 'Record Treasury' of the Public Record Office of Ireland in Dublin was destroyed in the opening engagement of Ireland's Civil War. The Treasury contained millions of historical documents filling 100,000 square feet of shelving organised into 5,500 series of records accumulated over seven centuries. It was destroyed in one afternoon.
Beyond 2022 is an international collaborative research project based at the ADAPT Centre, Trinity College Dublin, and funded by the Government of Ireland. We are working to create a virtual reimagining of this lost national archive. Many millions of words from destroyed documents will be linked and reassembled from copies, transcripts and other records scattered among the collections of our archival partners. We will bring together this rich array of replacement items within an immersive 3-D reconstruction of the destroyed building.
In this keynote address, we will discuss the Digital Humanities and Knowledge Engineering challenges presented by the project, and also reflect on how this reimagining of a lost archive will provide deeper search and discoverability than was possible one hundred years ago when the archive was still in existence.
One important contribution commonly ascribed to hypertext is the ability to combine different forms of expression, and so be considered 'multimodal' (or, at least, 'multimedial'). On closer analysis, however, theorizing just what this entails has remained limited. Similarly to the situation that long held concerning 'written' texts, it is too easily assumed that different modalities, sometimes labelled with terms such as 'text' or 'image', combine 'naturally' and so users should be able to follow such combinations with relative ease. Research on literacy, particularly with respect to contemporary media configurations, has shown this assumption to be false.
Constructing coherent interpretations of combinations of modalities can be far from straightforward, even when supported by good interface design; with poor design, which from the perspective of displayed 'documents' is unfortunately rather common, finding intended interpretations can present significant challenges. Now, when translated to the even more complex medial environment of hypertext, these potential problems are magnified considerably. Moreover, traditional considerations of where the 'boundaries' of hypertext might lie are now being redrawn as hypertext and the increasingly 'hyper'-connected medial world become increasingly permeable. The entire multimodal world of social media and participatory digital cultures might then be considered from a hypertext perspective, but research on hypertext itself lacks conceptual tools with the power necessary to engage with that world. Simple 'extensions' of traditional notions of hypertext are likely to prove insufficient for a full-blown account of multimodality.
In this talk we address these concerns from the perspective of current developments in multimodality studies, where the starting point is communication as such, regardless of the expressive forms that are used for that communication and whether communication is mediated computationally, via interlinked artefacts and pathways, or by cross-linked practices of digital and non-digital use. In short, current medial practices demand that hypertext be seen not simply as, for example, a shift from page-based documents to video, but as a further computationally supported environment for the development and deployment of core multimodal theoretical constructs such as semiotic modes, media and genres. We introduce these concepts and show several practical examples of processing from ongoing projects with a variety of media.
Online presence is becoming unavoidable for politicians worldwide. In countries such as the UK, Twitter has become the platform of choice, with over 85% (553 of 650) of the Members of Parliament (MPs) having an active online presence. Whereas this has allowed ordinary citizens unprecedented and immediate access to their elected representatives, it has also led to serious concerns about online hate towards MPs. This work attempts to shed light on the problem using a dataset of conversations between MPs and non-MPs over a two month period. Deviating from other approaches in the literature, our data captures entire threads of conversations between Twitter handles of MPs and citizens in order to provide a full context for content that may be flagged as 'hate'. By combining widely-used hate speech detection tools trained on several widely available datasets, we analyse 2.5 million tweets to identify hate speech against MPs and we characterise hate across multiple dimensions of time, topics and MPs' demographics. We find that MPs are subject to intense 'pile on' hate by citizens whereby they get more hate when they are already busy with a high volume of mentions regarding some event or situation. We also show that hate is more dense with regard to certain topics and that MPs who have an ethnic minority background and those holding positions in Government receive more hate than other MPs. We find evidence of citizens expressing negative sentiments while engaging in cross-party conversations, with supporters of one party (e.g. Labour) directing hate against MPs of another party (e.g. Conservative).
Has Hypertext killed off both the form and value of manuscript? Digital authoring first and web authoring later have changed drastically the availability and type of traces that reflect both creative and editorial processes. In this view, the consolidated approaches on manuscript studies involving the analysis of material artefacts are challenged. While new methodologies such as digital "forensics" and "virtual desks" are emerging, the nature and relations of native-digital manuscripts are yet to be fully investigated. This contribution accounts digital artefacts within the field of manuscript studies, identifying parallels between material manuscripts and hypertext features in their value as documents. The mapping between digital and material artefacts outlines a theory of manuscript "transmediations" identifying where and how manuscripts cues are reflected in digital technologies. This theory is developed through case studies and analyses of digital transitions. In a discussion, we highlight key challenges and future directions for scholarly editions of digital manuscripts. Lastly, we elaborate the requirements of a hypertext "genre" for digital manuscripts that supports reconciling the open-ended collaborative process of curation with the need for a coherent narrative addressed to the broader public.
In this paper, we advance the current state-of-the-art method for debiasing monolingual word embeddings so as to generalize well in a multilingual setting. We consider different methods to quantify bias and different debiasing approaches for monolingual as well as multilingual settings. We demonstrate the significance of our bias-mitigation approach on downstream NLP applications. Our proposed methods establish the state-of-the-art performance for debiasing multilingual embeddings for three Indian languages - Hindi, Bengali, and Telugu in addition to English. We believe that our work will open up new opportunities in building unbiased downstream NLP applications that are inherently dependent on the quality of the word embeddings used.
Control (Remedy Entertainment) and Hellblade: Senua's Sacrifice (Ninja Theory) demonstrate the potential for game design that defies expectations of immersive gameplay and embodied avatars. Building on game scholarship that recognizes 'immersion' as a "double-axis of incorporation" [8] consisting of a "complex interplay of actual and virtual worlds as perceived through a dually embodied player" [26,p. 73], we can see how these games achieve powerful moments of coattention through outmersive game design-deliberately alienating the player from an embodied avatar experience. Outmersion,a term coined by Gonzalo Frasca, offers a broader categorization for games that procedurally engender "critical distance" by directing player attention to and outside of the game itself [16]. This article uses close-play to explore how the characters of Jesse Faden in Control and Senua in Hellblade make use of the 'coinhabited avatar' trope-in which the avatar is possessed by non-player entities. This article identifies shared outcomes in the outmersive design of these characters, namely that they: 1) directly invoked the player 2) complicated the player's place in the avatar body 3) deceived the player 4) took agency from the player and 5) referenced game structures directly. Through outmersion, these games created provocative moments of player attention and reflection, simultaneously interrogating assumptions of power, rules, and embodiment. This article advocates for further exploration of outmersive game and interactive narrative design to challenge dominant presumptions about player-avatar interactions.
A representation learning method is considered stable if it consistently generates similar representation of the given data across multiple runs. Word Embedding Methods (WEMs) are a class of representation learning methods that generate dense vector representation for each word in the given text data. The central idea of this paper is to explore the stability measurement of WEMs using intrinsic evaluation based on word similarity. We experiment with three popular WEMs: Word2Vec, GloVe, and fastText. For stability measurement, we investigate the effect of five parameters involved in training these models. We perform experiments using four real-world datasets from different domains: Wikipedia, News, Song lyrics, and European parliament proceedings. We also observe the effect of WEM stability on two downstream tasks: Clustering and Fairness evaluation. Our experiments indicate that amongst the three WEMs, fastText is the most stable, followed by GloVe and Word2Vec.
Information is crucial to the function of a democratic society where well-informed citizens can make rational political decisions. While in the past political entities primarily utilized newspapers and later radio and television to inform the public, the political arena has transformed into a more complex structure with the rise of the Internet and online social media. Now, more than ever, people express themselves online while mainstream news agencies attempt to utilize the power of the Internet to spread their articles as much as possible. To grasp the political coexistence of mainstream media and online social media, in this paper, we analyze these two sources of information in the context of the U.S. 2020 presidential election. In particular, we collected data during the 2020 Democratic Party presidential primaries pertaining to the candidates, and, by analyzing this data, we highlight similarities and differences between these two main types of sources, detect the potential impact they have on each other, and understand how this impact relationship can change over time.
The COVID-19 pandemic has disrupted people's lives driving them to act in fear, anxiety, and anger, leading to worldwide racist events in the physical world and online social networks. Though there are works focusing on Sinophobia during the COVID-19 pandemic, less attention has been given to the recent surge in Islamophobia. A large number of positive cases arising out of the religious Tablighi Jamaat gathering has driven people towards forming anti-Muslim communities around hashtags like #coronajihad, #tablighijamaatvirus on Twitter. In addition to the online spaces, the rise in Islamophobia has also resulted in increased hate crimes in the real world. Hence, an investigation is required to create interventions. To the best of our knowledge, we present the first large-scale quantitative study linking Islamophobia with COVID-19.
In this paper, we present CoronaBias dataset which focuses on anti-Muslim hate spanning four months, with over 410,990 tweets from 244,229 unique users. We use this dataset to perform longitudinal analysis. We find the relation between the trend on Twitter with the offline events that happened over time, measure the qualitative changes in the context associated with the Muslim community, and perform macro and micro topic analysis to find prevalent topics. We also explore the nature of the content, focusing on the toxicity of the URLs shared within the tweets present in the CoronaBias dataset. Apart from the content-based analysis, we focus on user analysis, revealing that the portrayal of religion as a symbol of patriotism played a crucial role in deciding how the Muslim community was perceived during the pandemic. Through these experiments, we reveal the existence of anti-Muslim rhetoric around COVID-19 in the Indian sub-continent.
Hate speech is regarded as one of the crucial issues plaguing the online social media. The current literature on hate speech detection leverages primarily the textual content to find hateful posts and subsequently identify hateful users. However, this methodology disregards the social connections between users. In this paper, we run a detailed exploration of the problem space and investigate an array of models ranging from purely textual to graph based to finally semi-supervised techniques using Graph Neural Networks (GNN) that utilize both textual and graph-based features. We run exhaustive experiments on two datasets -- Gab, which is loosely moderated and Twitter, which is strictly moderated. Overall the AGNN model achieves 0.791 macro F1-score on the Gab dataset and 0.780 macro F1-score on the Twitter dataset using only 5% of the labeled instances, considerably outperforming all the other models including the fully supervised ones. We perform detailed error analysis on the best performing text and graph based models and observe that hateful users have unique network neighborhood signatures and the AGNN model benefits by paying attention to these signatures. This property, as we observe, also allows the model to generalize well across platforms in a zero-shot setting. Lastly, we utilize the best performing GNN model to analyze the evolution of hateful users and their targets over time in Gab.
The popularity of Twitter has fostered the emergence of various fraudulent user activities - one such activity is to artificially bolster the social reputation of Twitter profiles by gaining a large number of followers within a short time span. Many users want to gain followers to increase the visibility and reach of their profiles to wide audiences. This has provoked several blackmarket services to garner huge attention by providing artificial followers via the network of agreeable and compromised accounts in a collusive manner. Their activity is difficult to detect as the blackmarket services shape their behavior in such a way that users who are part of these services disguise themselves as genuine users. In this paper, we propose DECIFE, a framework to detect collusive users involved in producing 'following' activities through blackmarket services with the intention to gain collusive followers in return. We first construct a heterogeneous user-tweet-topic network to leverage the follower/followee relationships and linguistic properties of a user. The heterogeneous network is then decomposed to form four different subgraphs that capture the semantic relations between the users. An attention-based subgraph aggregation network is proposed to learn and combine the node representations from each subgraph. The combined representation is finally passed on to a hypersphere learning objective to detect collusive users. Comprehensive experiments on our curated dataset are conducted to validate the effectiveness of DECIFE by comparing it with other state-of-the-art approaches. To our knowledge, this is the first attempt to detect collusive users involved in blackmarket 'following services' on Twitter.
In this paper I address a problem related to video game culture with which modding has often engaged: the way queerness is portrayed in video games. I examine how mainstream games, indie games, and fan-created modifications relate to issues with queerness in video games. My goal is to analyze the ways in which modification can help players explore some of the problems surrounding portrayals of queerness in mainstream games. I also focus on the lessons that game designers and modders can learn from more positive portrayals of queerness in indie games. Overall, I suggest that considerations about a game's mainstream or indie status influence how both developers and players relate to queerness in games and argue that modding is a powerful way for players to engage with and explore issues with queerness found in mainstream games.
Recent work has shown that graph neural networks (GNNs) are vulnerable to adversarial attacks on graph data. Common attack approaches are typically informed, i.e. they have access to information about node attributes such as labels and feature vectors. In this work, we study adversarial attacks that are uninformed, where an attacker only has access to the graph structure, but no information about node attributes. Here the attacker aims to exploit structural knowledge and assumptions, which GNN models make about graph data. In particular, literature has shown that structural node centrality and similarity have a strong influence on learning with GNNs. Therefore, we study the impact of centrality and similarity on adversarial attacks on GNNs. We demonstrate that attackers can exploit this information to decrease the performance of GNNs by focusing on injecting links between nodes of low similarity and, surprisingly, low centrality. We show that structure-based uninformed attacks can approach the performance of informed attacks, while being computationally more efficient. With our paper, we present a new attack strategy on GNNs that we refer to as Structack. Structack can successfully manipulate the performance of GNNs with very limited information while operating under tight computational constraints. Our work contributes towards building more robust machine learning approaches on graphs.
A critical barrier facing engineering is inclusiveness of women in the profession. In recent years, engineering diversity advocates have taken to social media platforms to raise awareness of the issue and redress this problem. A recurring challenge for their initiatives though is attracting and mobilizing participants efficiently. For a successful mobilization campaign, organizers need real-time information about their users and also need to understand what messaging works to attract and mobilize them. We hypothesize that participants in any given campaign related to engineering diversity will also be interested in other campaigns related to that issue. Furthermore, since the primary signal for a social media campaign is a hashtag, by using clustering patterns of various co-occurring hashtags along with relevant topics and relatable sentiments, we can better understand participation and also mobilize users for the target campaign.
To empirically examine our hypothesis, we study two diversity hashtag activism campaigns on Twitter (#ILookLikeAnEngineer and #WomenInEngineering) using a real-time predictive analytics framework. We design and evaluate the framework with a set of novel features that uses retweetability as an indicator of participation.
Our result analysis for topical features found that monetary gain and advertisement-oriented content were less likely to be propagated in the campaigns whereas messaging aligned directly with the issue at hand such as breaking stereotypes in engineering was deemed more retweetable and engaging.
In terms of sentiments, informal tone in the messages were considered desirable whereas short-form messaging were not very popular in either movements.
These analytical insights can inform activists in effective resource mobilization through message content design, in order to expand the reach of an activism campaign. Our work shows how data-driven techniques can assist in increasing the participation of women in engineering education and the workforce.
Detecting crisis events accurately is an important task, as it allows the relevant authorities to implement necessary actions to mitigate damages. For this purpose, social media serve as a timely information source due to its prevalence and high volume of first-hand accounts. While there are prior works on crises detection, many of them do not perform crisis embedding and classification using state-of-the-art attention-based deep neural networks models, such as Transformers and document-level contextual embeddings. In contrast, we propose CrisisBERT, an end-to-end transformer-based model for two crisis classification tasks, namely crisis detection and crisis recognition, which shows promising results across accuracy and F1 scores. The proposed CrisisBERT model demonstrates superior robustness over various benchmarks, and it includes only marginal performance compromise while extending from 6 to 36 events with a mere 51.4% additional data points. We also propose Crisis2Vec, an attention-based, document-level contextual embedding architecture, for crisis embedding, which achieves better performance than conventional crisis embedding methods such as Word2Vec and GloVe.
Given the important role of search engines in our everyday lives, a better understanding of the information needs that guide our information seeking behavior is essential. Known-item needs form a particular type of information need and occur when a user has a limited but concrete description of an existing object and would like to (re-)find it. Most studies of know-item needs have focused on the short query representations of these needs as they occur in search engine logs. In this article, we focus on richer, more complex known-item need representations posted to six dedicated Reddit discussion forums in the casual leisure domain. An analysis of 462 known-item requests from these subreddits revealed 33 different relevance aspects of items in a variety of different domains. Some of these aspects are highly domain-specific, while others are broadly applicable across domains. The domain %of the item sought also has a strong influence on the length of the known-item requests. Our findings can be used to prioritize efforts to help existing search engines better support known-item needs, both by highlighting which aspects are easier to classify automatically and by determining which information sources should be added to a search engine's index.
Recent research has shown that explanations serve as an important means to increase transparency in group recommendations while also increasing users' privacy concerns. However, it is currently unclear what personal and contextual factors affect users' privacy concerns about various types of personal information. This paper studies the effect of users' personality traits and preference scenarios ---having a majority or minority preference--- on their privacy concerns regarding location and emotion information. To create natural scenarios of group decision-making where users can control the amount of information disclosed, we develop TouryBot, a chat-bot agent that generates natural language explanations to help group members explain their arguments for suggestions to the group in the tourism domain. We conducted a user study in which we instructed 541 participants to convince the group to either visit or skip a recommended place. Our results show that users generally have a larger concern regarding the disclosure of emotion compared to location information. However, we found no evidence that personality traits or preference scenarios affect privacy concerns in our task. Further analyses revealed that task design (i.e., the pressure on users to convince the group) had an effect on participants' emotion-related privacy concerns. Our study also highlights the utility of providing users with the option of partial disclosure of personal information, which appeared to be popular among the participants.
Designing tasks clearly to facilitate accurate task completion is a challenging endeavor for requesters on crowdsourcing platforms. Prior research shows that inexperienced requesters fail to write clear and complete task descriptions which directly leads to low quality submissions from workers. By complementing existing works that have aimed to address this challenge, in this paper we study whether clarity flaws in task descriptions can be identified automatically using natural language processing methods. We identify and synthesize seven clarity flaws in task descriptions that are grounded in relevant literature. We build both BERT-based and feature-based binary classifiers, in order to study the extent to which clarity flaws in task descriptions can be computationally assessed, and understand textual properties of descriptions that affect task clarity. Through a crowdsourced study, we collect annotations of clarity flaws in 1332 real task descriptions. Using this dataset, we evaluate several configurations of the classifiers. Our results indicate that nearly all the clarity flaws in task descriptions can be assessed reasonably by the classifiers. We found that the content, style, and readability of tasks descriptions are particularly important in shaping their clarity. This work has important implications on the design of tools to help requesters in improving task clarity on crowdsourcing platforms. Flaw-specific properties can provide for valuable guidance in improving task descriptions.
This paper investigates the correlation between moral foundations and the expression of opinions in the form of stance on different issues of public interest. This work is based on the assumption that the formation of values (personal and societal) and language are interrelated, and that we can observe differences in points of view in user-generated text data. We leverage the Moral Foundations Theory to expand the scope of stance analysis by examining the narratives in favor or against several topics. Applying an expanded version of the Moral Foundations Dictionary to a benchmark dataset for stance analysis, we capture and analyze the relationships between moral values and polarized online discussions. Using this enhanced methodology, we find that each social issue has different "moral and lexical profiles." While some social issues project more authority related words (Donald Trump), others consists of words related to care and purity (abortion and feminism). Our correlation analysis of stance and morality revealed notable associations between stances on social issues and various types of morality, such as care, fairness, and loyalty, hence demonstrating that there are certain morality types that are more attributed to stance classification than others. Overall, our analysis highlights the usefulness of considering morality when studying stance. The differences observed in various viewpoints and stances highlights linguistic variation in discourse, which may assist in analyzing cultural values and biases in society.
During online information search, users tend to select search results that confirm previous beliefs and ignore competing possibilities. This systematic pattern in human behavior is known as confirmation bias. In this paper, we study the effect of obfuscation (i.e., hiding the result unless the user clicks on it) with warning labels and the effect of task on interaction with attitude-confirming search results. We conducted a preregistered, between-subjects crowdsourced user study (N=328) comparing six groups: three levels of obfuscation (targeted, random, none) and two levels of task (joint, two separate) for four debated topics. We found that both types of obfuscation influence user interactions, and in particular that targeted obfuscation helps decrease interaction with attitude-confirming search results. Future work is needed to understand how much of the observed effect is due to the strong influence of obfuscation, versus the warning label or the task design. We discuss design guidelines concerning system goals such as decreasing consumption of attitude-confirming search results, versus nudging users toward a more analytical mode of information processing. We also discuss implications for future work, such as the effects of interventions for confirmation bias mitigation over repeated exposure. We conclude with a strong word of caution: measures such as obfuscations should only be used for the benefit of the user, e.g., when they explicitly consent to mitigating their own biases.
Complaining is a speech act that is often used by consumers to signify a breach of expectation, i.e., an expression of displeasure on a consumer's behalf towards an organization, product, or event. Complaint identification has been previously analyzed based on extensive feature engineering in centralized settings, disregarding the non-identically independently distributed (non-IID), security, and privacy-preserving characteristics of complaints that can hamper data accumulation, distribution, and learning. In this work, we propose a Bidirectional Encoder Representations from Transformers (BERT) based multi-task framework that aims to learn two closely related tasks,viz. complaint identification (primary task) and sentiment classification (auxiliary tasks) concurrently under federated-learning settings. Extensive evaluation on two real-world datasets shows that our proposed framework surpasses the baselines and state-of-the-art framework results by a significant margin.
This paper investigates genre and media specificity of electronic literature created in Scalar. Scalar is a platform and authoring tool created specifically for humanities scholars to enable multimodal and multilinear publications. Besides scholarly work, Robert Budac's The Scalar Conspiracy [4], Steven Wingate's daddylabyrinth: a digital lyric memoir [12] and micha cárdenas' Redshift & Portalmetal [5] are all works of electronic literature created in Scalar. I demonstrate that all three of these works use Scalar to create genre-bending texts that build on and subvert the technological affordances as well as the contextual connotations that Scalar provides. The Scalar Conspiracy parodies the counter-intuitive user interface elements by making the reader investigate the text's different hidden messages. daddylabyrinth: a digital lyric memoir destabilizes the genre of the (auto)biography by promoting documentation and research while continuously showing how these processes fall short during the writing and reading process. Redshift & Portalmetal favors experience over documentation to create a work that is both immersive and theory-building. These Scalar fictions are characterized by the premise that the platform's academic context strengthens the narrative. Researching the multimodality and academic context as integral parts of the narrative structure opens up the opportunity to reckon with the platform-specificity across genres.
Most hate speech detection research focuses on a single language, generally English, which limits their generalisability to other languages. In this paper we investigate the cross-lingual hate speech detection task, tackling the problem by adapting the hate speech resources from one language to another. We propose a cross-lingual capsule network learning model coupled with extra domain-specific lexical semantics for hate speech (CCNL-Ex). Our model achieves state-of-the-art performance on benchmark datasets from AMI@Evalita2018 and AMI@Ibereval2018 involving three languages: English, Spanish and Italian, outperforming state-of-the-art baselines on all six language pairs.
The rise of fake news in the past decade has brought with it a host of consequences, from swaying opinions on elections to generating uncertainty during a pandemic. A majority of methods developed to combat disinformation either focus on fake news content or malicious actors who generate it. However, the virality of fake news is largely dependent upon the users who propagate it. A deeper understanding of these users can contribute to the development of a framework for identifying users who are likely to spread fake news. In this work, we study the characteristics and motivational factors of fake news spreaders on social media with input from psychological theories and behavioral studies. We then perform a series of experiments to determine if fake news spreaders can be found to exhibit different characteristics than other users. Further, we investigate our findings by testing whether the characteristics we observe amongst fake news spreaders in our experiments can be applied to the detection of fake news spreaders in a real social media environment.
Blind users interact with smartphone applications using a screen reader, an assistive technology that enables them to navigate and listen to application content using touch gestures. Since blind users rely on screen reader audio, interacting with online videos can be challenging due to the screen reader audio interfering with the video sounds. Existing solutions to address this interference problem are predominantly designed for desktop scenarios, where special keyboard or mouse actions are supported to facilitate 'silent' and direct access to various video controls such as play, pause, and progress bar. As these solutions are not transferable to smartphones, suitable alternatives are desired. In this regard, we explore the potential of motion gestures in smartphones as an effective and convenient method for blind screen reader users to interact with online videos. Specifically, we designed and developed YouTilt, an Android application that enables screen reader users to exploit an assortment of motion gestures to access and manipulate various video controls. We then conducted a user study with 10 blind participants to investigate whether blind users can leverage YouTilt to properly execute motion gestures for video-interaction tasks while simultaneously listening to video sounds. Analysis of the study data showed a significant improvement in usability by as much as 43.3% (avg.) with YouTilt compared to that with default screen reader, and overall a positive attitude and acceptance towards motion gesture-based video interaction.
Although the internet is a means for disseminating information and facilitating social interactions, these benefits are limited due to individuals' propensity for engaging within a narrow range of communities that share similar beliefs. A portion of these online communities facilitate radicalist viewpoints, including toward marginalized populations, contributing to misbehavior and exacerbating social inequalities. Although a variety of theories propose to explain the processes of online radicalization, less work has empirically examined how users' communication patterns change over time, especially in terms of novelty versus regularity of user comment features. The present research demonstrates a new modeling approach for examining the extent to which low-level, multimodal comment patterns evolve as users communicate within a Reddit forum well-known for its extreme misogynism. Our results confirm that low-level comment patterns predict high-level features of radicalization, aligning with theory on attitude polarization and contributing to literature on detection and interventions to mitigate extremism.
In this paper we report on a complex and complete archive of historical primary sources that map the political landscape of the anglophone world in the mid-to late 1800s. The ruthless pragmatism applied to the construction of the initial Humanities dataset resulted in an analogue equivalent of a hypertext system, which has already resulted in published academic books and articles. Here, we describe the processes of a current project, which consists of the translation of this analogue information aggregation system into a graph database using Linked Data and semantic Web technologies.
Nowadays, the large-scale human activity traces on social media platforms such as Twitter provide new opportunities for various research areas such as mining user interests, understanding user behaviors, or conducting social science studies in a large scale. However, social media platforms contain not only individual accounts but also other accounts that are associated with non-individuals such as organizations or brands. Therefore, distinguishing individuals out of all accounts is crucial when we conduct research such as understanding human behavior based on data retrieved from those platforms. In this paper, we propose a language-independent approach for distinguishing individuals from non-individuals with the focus on leveraging their profile images, which has not been explored in previous studies. Extensive experiments on two datasets show that our proposed approach can provide competitive performance with state-of-the-art language-dependent methods, and outperforms alternative language-independent ones.
This paper aims to investigate the use of emojis to contextualize mourning on Twitter. Specifically, we seek to determine (i) whether an emoji is sufficient to contextualize expressions of grief; (ii) which emojis most accurately represent mourning; (iii) whether only words are used to contextualize mourning; (iv) which words are used to characterize mourning in tweets; and, (v) if there are differences in the expression of mourning in different languages. For this, we use a multi-stage method to conduct a comprehensive analysis of the manifestations of grieving behavior on Twitter, and created machine learning models to classify expressions of mourning in tweets. The main contributions from this work are (1) a gold standard of manually annotated mourning tweets; (2) classification models produced using machine learning ensemble methods and BERT contextual embeddings; and, (3) an extensive analysis of our findings opening up opportunities for new research. The results of this paper reveal emojis alone are insufficient for identifying expressions of mourning in tweets, and the combination of both emojis and words is the most effective strategy for contextualizing mourning online -- the models achieved the 84.8%-97% F1 score in all datasets. Although words alone are capable of characterizing mourning contexts correctly, the English vocabulary is limited, and the contribution of RIP - the abbreviation for "rest in peace'' - is highly decisive. Our results have also shown that the most relevant emojis for this context were emotional ones, such as \includegraphics[width=1em]twitter_brokenheart.png, and emojis are used in a uniform fashion in both Spanish and English.
Teenager detection is an important case of the age detection task in social media, which aims to detect teenage users to protect them from negative influences. The teenager detection task suffers from the scarcity of labelled data, which exacerbates the ability to perform well across social media platforms. To further research in teenager detection in settings where no labelled data is available for a platform, we propose a novel cross-platform framework based on Adversarial BERT. Our framework can operate with a limited amount of labelled instances from the source platform and with no labelled data from the target platform, transferring knowledge from the source to the target social media. We experiment on four publicly available datasets, obtaining results demonstrating that our framework can significantly improve over competitive baseline modelson the cross-platform teenager detection task.
This paper presents a way for the hypertext community to gain strength and contribute to other fields of research by joining forces. It discusses the challenges that need to be addressed with respect to geographically scattered students and scholars, interdisciplinary courses, and students with various foreknowledge. We propose the INTR/HT project, a platform that aims for bringing hypertext scholars and students together worldwide. The interdisciplinary approach fosters creativity in the context of hypertext and is valuable for educating and supporting the next generation of hypertext scholars and researchers.
Over the last few decades, we have seen massive improvements in computing power, but nevertheless we still rely on digital documents and file systems that were originally created by mimicking the characteristics of physical storage media with all its limitations. This is quite surprising given that even before the existence of the computer, Information Science visionaries such as Vannevar Bush described more powerful information management solutions. We therefore aim to improve the way information is managed in modern desktop environments by embedding a hypermedia engine offering rich hypermedia and cross-media concepts at the level of an operating system. We discuss the resource-selector-link (RSL) hypermedia metamodel as a candidate for realising such a general hypermedia engine and highlight its flexibility based on a number of domain-specific applications that have been developed over the last two decades. The underlying content repository will no longer rely on monolithic files, but rather contain a user's data in the form of content fragments, such as snippets of text or images, which are structurally linked to form the corresponding documents, and can be reused in other documents or even shared across computers. By increasing the scope to a system-wide hypermedia engine, we have to deal with fundamental challenges related to granularity, interoperability or context resolving. We strongly believe that computing technology has evolved enough to revisit and address these challenges, laying the foundation for a wide range of innovative use cases for efficiently managing cross-media content in modern desktop environments.
Modern browsers, as we know them from the Web, are used to query and present a variety of different resources. This usually happens by traversing links (i.e., URIs) in hypertext documents. The creation of new links however, is impossible to ordinary users, because they usually are recipients, but not owners of the received resource. In this paper, we demonstrate a browser plugin called "Weblinks", which offers its users an additional and rich linking layer over the existing Web. This enhances the notion of links as strings (i.e., URIs) in today's Web context to links as rich objects (n-ary, unidirectional, or bidirectional), which can be created, traversed or shared by anyone using the Weblinks browser plugin.
Social media advertising data, particularly data from Facebook's advertising platform, have been successfully used for monitoring population and development indicators, with an emphasis on monitoring digital gender inequality. This paper contributes to this literature by assessing the feasibility of the attribute of user behavior of "using a mobile device for X months" available from Facebook's advertising platform to understand short-term global mobile diffusion dynamics and mobile phone gender gaps. We compare this attribute with other features of the platform to form a better understanding of the data and the digital behaviours they capture, and show how this Facebook attribute relates to mobile phone penetration rates and gender gaps in mobile access. We find that this "Uses a mobile device (X months)" advertising targeting attribute can be used as a proxy for changes in mobile phone penetration rates, especially among younger users, and that it captures cross-national variation in mobile gender gaps. We further find that countries with larger gender gaps disfavoring women are comparatively more gender-equal among the most recently joined cohort.
The growth of online Digital/social media has allowed a variety of ideas and opinions to coexist. Social Media has appealed users due to the ease of fast dissemination of information at low cost and easy access. However, due to the growth in affordance of Digital platforms, users have become prone to consume disinformation, misinformation, propaganda, and conspiracy theories. In this paper, we wish to explore the links between the personality traits given by the Big Five Inventory and their susceptibility to disinformation. More specifically, this study is attributed to capture the short-term as well as the long-term effects of disinformation and its effects on the five personality traits. Further, we expect to observe that different personalities traits have different shifts in opinion and different increase or decrease of uncertainty on an issue after consuming the disinformation. Based on the findings of this study, we would like to propose a personalized narrative-based change in behavior for different personality traits.