While large language models (LLMs) exhibit significant utility across various domains, they simultaneously are susceptible to exploitation for unethical purposes, including academic misconduct and dissemination of misinformation. Consequently, AI-generated text detection systems have emerged as a countermeasure. However, these detection mechanisms demonstrate vulnerability to evasion techniques and lack robustness against textual manipulations. This paper introduces back-translation as a novel technique for evading detection, underscoring the need to enhance the robustness of current detection systems. The proposed method involves translating AI-generated text through multiple languages before back-translating to English. We present a model that combines these back-translated texts to produce a manipulated version of the original AI-generated text. Our findings demonstrate that the manipulated text retains the original semantics while significantly reducing the true positive rate (TPR) of existing detection methods. We evaluate this technique on nine AI detectors, including six open-source and three proprietary systems, revealing their susceptibility to back-translation manipulation. In response to the identified shortcomings of existing AI text detectors, we present a countermeasure to improve the robustness against this form of manipulation. Our results indicate that the TPR of the proposed method declines by only \(1.85\%\) after back-translation manipulation. Furthermore, we build a large dataset of 720k texts using eight different LLMs. Our dataset contains both human-authored and LLM-generated texts in various domains and writing styles to assess the performance of our method and existing detectors. This dataset is publicly shared for the benefit of the research community.
Despite widespread concerns about the risks of AI-generated content (AIGC) to the integrity of social media discourse, little is known about its scale and scope, the actors responsible for its dissemination online, and the user responses it elicits. In this work, we measure and characterize the prevalence, spreaders, and emotional reception of AI-generated political images. Analyzing a large-scale dataset from Twitter/\(\mathbb {X}$ \) related to the 2024 U.S. Presidential Election, we find that approximately 12% of shared images are detected as AI-generated, and around 10% of users are responsible for sharing 80% of AI-generated images. AIGC superspreaders—defined as the users who not only share a high volume of AI-generated images but also receive substantial engagement through retweets—are more likely to be \(\mathbb {X}$ \) Premium subscribers, have a right-leaning orientation, and exhibit automated behavior. Their profiles contain a higher proportion of AI-generated images than non-superspreaders, and some engage in extreme levels of AIGC sharing. Moreover, superspreaders’ AI image tweets elicit more positive and less toxic responses than their non-AI image tweets. This study serves as one of the first steps toward understanding the role generative AI plays in shaping online socio-political environments and offers implications for platform governance.
Science news is increasingly important in connecting scientists and the public by sharing discoveries and innovations. With the rise of large language models (LLMs), there is potential to automate science news creation, but concerns exist about the quality of LLM-generated news versus human-written news. This paper explores whether LLMs can outperform humans in distinguishing between human-written and LLM-generated news. Inspired by the Chain-of-Thought prompting method, we designed a simple yet effective variant called Guided Few-shot (GFS), which encodes the characteristics of news of two types with examples. Our experiments indicated that GFS with just a single example effectively boosted the performance of all LLMs and made open-weight LLMs achieve or exceed the performance of humans and commercial LLMs. The code and data are available in our repository at https://github.com/lamps-lab/sanews.
Stance detection is vital for promoting a trustworthy, human-centric Web by identifying biased or harmful narratives in user-generated content. Whereas recent LLM-based methods excel in accuracy, they often lack interpretability. We propose a generative stance detection approach that outputs explicit rationales and distills them into smaller language models (SLMs) via single-task and multitask learning. Our method enables Flan-T5 to outperform GPT-3.5 zero-shot by up to 9.57%. We further show that rationales enhance multitask performance and improve distillation fidelity, advancing the development of transparent, fair, and trustworthy NLP systems 1.
In an era of information overload, research writing, particularly literature review composition, has become increasingly burdensome due to the sheer volume of scholarly publications released each year. This paper introduces WriteAssist, a novel standalone authoring system that helps researchers efficiently generate literature review sections. Given the title and abstract of a work-in-progress manuscript, WriteAssist automatically retrieves relevant and recent peer-reviewed articles, highlighting portions that offer supporting or contrasting perspectives. A key innovation lies in its personalized recommendation engine, which tailors results based on the user’s prior publications and research profile, enabling context-aware synthesis. We position WriteAssist within the landscape of intelligent writing assistants, academic search platforms, and personalized recommender systems, and we detail its architecture – integrating natural language processing and user modeling to streamline academic writing. The system represents a significant step toward alleviating cognitive overload in scholarly composition and offers a blueprint for smarter, adaptive tools in academic research support.
In this paper, we analyze how friend recommendation algorithms on social networks promote echo chambers. We analyze both link bias and content bias using a real social graph from X (Twitter). We extract a follow graph from X, repeatedly add new edges selected by a recommendation algorithm, and observe how the degree of bias in the graph changes. Our findings include: (1) the follow graph of X is sufficiently homophilic for recommendation algorithms to produce link bias, (2) iterated recommendations do not accelerate increase of content bias, (3) even when an algorithm recommends no user from the target user’s community, it sometimes produces link bias by recommending users from a few other communities, (4) but no similar phenomenon is observed for content bias.
Social media platforms like Twitter (now X) serve as vital arenas for public discourse but are marked by high content volatility, particularly due to tweet deletions. These deletions may stem from personal reconsideration or indicate coordinated, strategic behavior. This study investigates deletion patterns in two large-scale datasets: general discourse (TweetsKB) and COVID-19-related discourse (TweetsCOV19), with a focus on how deletion behavior is influenced by topic and ideological polarization across political and scientific dimensions. We find that 29% of TweetsKB and 23% of TweetsCOV19 tweets are deleted within a year of posting. Deleted tweets tend to express more negative sentiment than retained ones. In both datasets, polarized tweets show lower overall deletion rates, while tweets polarized along the science dimension are more likely to be deleted than politically polarized tweets. These findings reveal how deletion dynamics intersect with the structure of online debate, offering insights into the lifecycle of digital discourse and the impact of polarization on content persistence.
Nowadays, most listeners access music through streaming platforms, which has transformed how recommendations are delivered and received. Notable work has been done on improving fairness for end users or item providers in music recommender systems, often applying pre-processing and re-ranking approaches. However, the positive influence users may exert on fairness through their music selection has not been sufficiently studied. This paper explores whether providing users with insights into the fairness and diversity of recommendations could lead to fairer music selections. To this aim, we conducted a qualitative study with 18 participants involving a think-aloud playlist task and in-depth interviews. The results indicate that while music taste was the deciding factor in participants’ choices, insights into fairness did help them reflect on their selections. Our study shows that participants’ playlist choices at times conflict with their expressed fairness values, highlighting the need to support users in aligning decisions with their values.
Recent advances on the web have made social media a significant platform for disseminating scientific information. However, posts referencing peer-reviewed research often lack sufficient context for non-experts to assess their credibility. In this study, we investigate how different strategies for enriching social media posts referencing scientific publications affect users’ trust perceptions and sharing behavior. We developed a web-based platform that simulates a social media feed containing posts with links to scientific publications and conducted a user study (N=160), comparing four conditions: a baseline with original posts, and three enriched variants containing (1) metadata from the publication (title, abstract, authors), (2) a direct quote from the publication, and (3) an AI-generated summary of the publication. Our results show that enriched posts were shared more frequently than baseline posts, though trustworthiness ratings did not significantly differ. Furthermore, the AI-generated summaries were perceived as the most understandable form of enrichment. Interaction data showed that users were more likely to engage with the enriched content than with posts containing only links to the publications.
The rise of social networking platforms has amplified privacy threats as users increasingly share sensitive information across profiles, content, and social connections. We present a Comprehensive Privacy Risk Scoring (CPRS) framework that quantifies privacy risk by integrating user attributes, social graph structures, and user-generated content. Our framework computes risk scores across these dimensions using sensitivity, visibility, structural similarity, and entity-level analysis, then aggregates them into a unified risk score. We validate CPRS on two real-world datasets: the SNAP Facebook Ego Network (4,039 users) and the Koo microblogging dataset (1M posts, 1M comments). The average CPRS is 0.478 with equal weighting, rising to 0.501 in graph-sensitive scenarios. Component-wise, graph-based risks (mean 0.52) surpass content (0.48) and profile attributes (0.45). High-risk attributes include Email, Date of Birth, and Mobile Number. Our user study with 100 participants shows 85% rated the dashboard as clear and actionable, confirming CPRS’s practical utility. This work enables personalized privacy risk insights and contributes a holistic, scalable methodology for privacy management. Future directions include incorporating temporal dynamics and multimodal content for broader applicability.
Online platforms offer forums with rich, real-world illustrations of moral reasoning. Among these, the r/AmITheAsshole (AITA) subreddit has become a prominent resource for computational research. In AITA, a user (author) describes an interpersonal moral scenario, and other users (commenters) provide moral judgments with reasons for who in the scenario is blameworthy. Prior work has focused on predicting moral judgments from AITA posts and comments. This study introduces the concept of moral sparks—key narrative excerpts that commenters highlight as pivotal to their judgments. Thus, sparks represent heightened moral attention, guiding readers to effective rationales.
Through 24,676 posts and 175,988 comments, we demonstrate that research in social psychology on moral judgments extends to real-world scenarios. For example, negative traits (rude) amplify moral attention, whereas sympathetic traits (vulnerable) diminish it. Similarly, linguistic features, such as emotionally charged terms (e.g., anger), heighten moral attention, whereas positive or neutral terms (leisure and bio) attenuate it. Moreover, we find that incorporating moral sparks enhances pretrained language models’ performance on predicting moral judgment, achieving gains in F1 scores of up to 5.5%. These results demonstrate that moral sparks, derived directly from AITA narratives, capture key aspects of moral judgment and perform comparably to prior methods that depend on human annotation or large-scale generative modeling.
State-sponsored information operations (IOs) increasingly influence global discourse on social media platforms, yet their emotional and rhetorical strategies remain inadequately characterized in scientific literature. This study presents the first comprehensive analysis of toxic language deployment within such campaigns, examining 56 million posts from over 42 thousand accounts linked to 18 distinct geopolitical entities on X/Twitter. Using Google’s Perspective API, we systematically detect and quantify six categories of toxic content and analyze their distribution across national origins, linguistic structures, and engagement metrics, providing essential information regarding the underlying patterns of such operations. Our findings reveal that while toxic content constitutes only 1.53% of all posts, they are associated with disproportionately high engagement and appear to be strategically deployed in specific geopolitical contexts. Notably, toxic content originating from Russian influence operations receives significantly higher user engagement compared to influence operations from any other country in our dataset. Our code is available at https://github.com/shafin191/Toxic_IO.
Screenshots of social media posts are a common approach for information sharing. Unfortunately, before sharing a screenshot, users rarely verify whether the attribution of the post is fake or real. There are numerous legitimate reasons to share screenshots. However, sharing screenshots of social media posts is also a vector for mis-/disinformation spread on social media. We are exploring methods to verify the attribution of a social media post shown in a screenshot, using resources found on the live web and in web archives. We focus on the use of web archives, since the attribution of non-deleted posts can be relatively easily verified using the live web. We show how information from a Twitter screenshot (Twitter handle, timestamp, and tweet text) can be extracted and used for locating potential archived tweets in the Internet Archive’s Wayback Machine. We evaluate our method on a dataset of 1,571 single tweet screenshots.
Spatial Hypertext (SH) has been a long-standing facet of research within the Hypertext field, yet active research is declining despite ongoing interest in the concept. Is this because the question of SH’s nature is resolved, or because is needs re-contextualising? This paper summarises SH’s history for today’s post-Web researcher as well as current SH research, tools and SH’s possible future.
This work discusses hyperlinked citations to two historical e-literature works articulated as hypercitations within the Game and Interactive Software Scholarship Toolkit (GISST). Hypercitations allow a user to link directly into a particular moment or performance of e.g. an historical program by retrieving a past computational state of that program or an input recording mapped over a series of states. In this case, we provide examples of two works preserved by the Electronic Literature Organization (Judy Malloy’s Uncle Roger and Rob Swigart’s Portal) to illustrate the potential of hypercitation for the discussion, analysis, and sharing of historical hypertextual works. We also note the potential for new forms of hypertextual scholarship provided through the linking and embedding of meta-hypertextual citations.
In his 1994 doctoral dissertation, Andreas Dieberger proposed to visualize hypertexts as cityscapes. Revisiting this concept with the aid of modern processors and displays, we have found that the Information City provides a new perspective on hypertext visualization itself, a practice inspired both by structuralist and by existentialist thought. Where conventional spatial hypertext has tended to focus on proximity, the Information City often foregrounds the implicit semantics of the spaces between buildings.
At the end of 2024, Google released a significant update to the guidelines for their quality ranking. Positioning is now based on "content appropriateness" defined by four main user intents: informational, transactional, navigation, localisation.
The language of these guidelines at times feels like a nostalgic call to a utopian web not realised: in its repudiation of SEO tactics, its emphasis on information and knowledge creation, its focus on expertise and original content. We offer a speculative exploration of what a future web might look like, with an eye to particular forms of hypertext.
This study afforded the opportunity to explore Google’s AI Gemini. Used as a proxy for the new AI engines implementing these ranking guidelines, we evaluate its ability to understand hypertext narrative. The paper provides two fully developed case studies, as well as a novel methodology to evaluate the AI’s understanding of a more creative future web.
Hypertext and games research have long been intertwined; both as understanding games as a form of hypertext, and also through identifying ludic interactions and patterns in hypermedia. While this research often seeks to understand theoretical and structural connections, in this paper we seek to go beyond these to build an explicitly hypertextual game engine. We target Mixed Reality (MR) and locative games for our engine as media that have been previously identified as well-suited to sculptural hypertext, and demonstrate the value of hypertextual patterns and structures as part of a game creation toolkit. In this paper we present LUTE (LoGaCulture Unity Toolkit Engine), a technology framework built on top of the Unity 3D game engine for the creation of MR games and locative experiences. LUTE builds on the established state of the art in both MR games and Creativity Support Tools (CSTs) with a modular design that uses hypertextual structures to handle the flow of content nodes in the game, and a declarative order system to specify gameplay in those nodes.
This study explores how students of Ancient Greek interact with digital commentaries enhanced by inline reference resolution. Comparing static and hover-based interfaces, we identify distinct behavioral patterns and preferences shaped in part by prior tool use. Findings inform hypertext design strategies that balance cognitive load with access to contextual scholarly information.
The article examines A.D. 2044 or Sexmission, an Interactive Fiction and Polish text-based graphical adventure game developed by Roland Pantoła and released by LK Avalon in 1991. This game marks a significant milestone in the early history of Polish game development. The study aims to reconstruct the game’s operating mechanisms and explore how Pantoła navigated the limitations of the 8-bit Atari computer and personal environmental challenges, as a substitute for the technological crises that may come for humanity. It focuses on extending the rare Forth programming language and Pantoła’s proprietary engine. Using a media archaeology perspective, the research references the original floppy disks containing tools and source code. Furthermore, it interprets the game as a political satire reflecting on late-communist Poland’s realities. The work is organized into three main subjects: the tool (Chapter 1), the programmer (Chapter 2), and the creation itself (Chapters 3). This study contributes to the interdisciplinary understanding of computer games’ history and cultural context in Poland.
Scientific discourse on the social web has been shown to compromise the accuracy of scientific findings. Complex scientific claims are uttered in the form of short snippets with "implicit references" (seen as references to scientific publications where the URLs to the actual studies are never cited). This has led to uninformed online scientific debates on topics such as health pandemics or climate. To enhance social media content, we introduce in this paper the novel task of disambiguation of implicit scientific references, where the goal is to retrieve the original scientific publications implicitly referred to by social media users. We contribute the first formalization, ground-truth corpus, and baselines for the task. With this work, we aim at shaping an understanding of implicit references on social media, and at laying a foundation for developing and evaluating methods for the disambiguation of implicit references.
This paper introduces HyperSumm-RL, a hypertext-aware summarization and interaction analysis framework designed to investigate human perceptions of social robot leadership through long-form dialogue. The system utilizes a structured Natural Language Processing (NLP) workflow that combines transformer-based long dialogue summarization, leadership style modeling, and user response analysis, enabling scalable evaluation of social robots in complex human-robot interaction (HRI) settings. Unlike prior work that focuses on static or task-oriented HRI, HyperSumm-RL captures and hypertextually organizes dynamic conversational exchanges into navigable, semantically rich representations which allows researchers to trace interaction threads, identify influence cues, and analyze leadership framing over time. The contributions of this study are threefold: (1) we present a novel infrastructure for summarizing and linking long, multi-turn dialogues using leadership-style taxonomies; (2) we propose an interactive hypertext model that supports relational navigation across conversational themes, participant responses, and robot behavior modes; and (3) we demonstrate the utility of this system in interpreting participant trust, engagement, and expectation shifts during social robot leadership scenarios. The findings reveal how hypertextual workflows can augment HRI research by enabling transparent, interpretable, and semantically grounded analysis of emergent social dynamics.
The enhanced visualization and interaction with information in collaborative VR environments enabled by chatbots is currently rather limited. To fill this gap and create a concrete application that combines spatial and virtual concepts of hypertext systems based on the use of LLMs, we present VR-ParlExplorer as a system for virtualizing plenary debates that allows users to interact with virtual members of parliament through chatbots. VR-ParlExplorer is implemented as a Plugin for Va.Si.Li-Lab to enable immersion in the dynamics of communication in parliamentary debates. The paper describes the functionality of VR-ParlExplorer and discusses specifics of the use case it addresses.
In this paper, we analyze a set of three proposals—titled Triptych—which carefully extend HTML to support more generalized hypermedia controls. We evaluate the expressive power of these proposals by demonstrating which user experience patterns they make possible to describe in HTML, and which patterns remain unsupported.
We also introduce the concept of behavorial affordances which characterize common UX patterns in web applications. Through this analysis of UX patterns, we show that HTML currently lacks a native mechanism for expressing behavioral affordances. Finally, we theorize a mechanism for defining arbitrary behavioral affordances that could fill this expressive gap.