Hypertext Conference 2008

SIS logo

ACM logo

SIGWEB logo

Accepted Technical Papers

David Kolb. Making Revisions Hyper-Visible
Abstract: What should a revised edition of a hypertext be? How might revising a hypertext differ from reissuing a printed book? This essay suggests a revision process that is complex, self-reflexive, and explicitly visible, taking advantage of the ability of hypertext to expand the "margins" of a document in new directions. Where the issues are complex enough, the process of revision should be part of what is presented, not just a machine rumbling in the background that issues in a separate product
David Kolb. The Revenge of the Page
Abstract: Writers of literary hypertext have urged complexly linked hypertext forms. Some writers have applied this as well to expository and argumentative hypertext, urging texts that are complex and self-reflexive, taking advantage of hypertext's ability to expand the "margins" of a document in new directions. Where argumentative issues or contexts are complex enough, they urge that hypertexts grow towards multi-dimensional expository/argumentative texts with elaborate internal link structures. However this assumption is challenged by developments on the Web, where a mini-essay style dominates with simple link patterns. This essay analyzes the situation, argues for the viability of more complex hypertexts, and suggests some compositional strategies.
Rasmus Petersen and Uffe Wiil. ASAP: A Planning Tool for Agile Software Development
Abstract: This paper describes the ASAP planning tool. ASAP uses different hypertext structuring mechanisms to provide support for project planning. The design concepts and prototype features are inspired by previous work on structural computing and spatial hypertext. A use scenario demonstrates the capabilities of the tool to support the Blitz Planning activity from the Crystal Clear agile software development methodology. Future work is aimed at broadening the applicability of ASAP towards general project planning.
Klaas Dellschaft and Steffen Staab. An Epistemic Dynamic Model for Tagging Systems
Abstract: In recent literature, several models were proposed for reproducing and understanding the tagging behavior of users. They all assume that the tagging behavior is influenced by the previous tag assignments of other users. But they are only partially successful in reproducing characteristic properties
found in tag streams. We argue that this inadequacy of existing models results from their inability to include user's background knowledge into their model of tagging behavior. This paper presents a generative tagging model that integrates both components, the background knowledge and the influence of previous tag assignments. Our model successfully reproduces characteristic properties of tag streams. It even explains effects of the user interface on the tag stream.
Raquel Recuero. Information Flows and Social Capital in Weblogs: A case study in the Brazilian blogosphere
Abstract: Blogs are tools for publishing information that have become very popular due to the way they facilitate the process of publishing on the Internet. Due to their popularity, blogs influence how information flows in cyberspace. This paper deals with the relations between bloggers' perceived social capital and motivations with the information they choose to publish. Based on a case study of a network of 48 weblogs, 32 interviews and 988 analyzed memes, we show how, for the studied case, information flow is influenced by bloggers' motivations and perceptions.
Frank Allan Hansen and Kaj Gronbak. Social Web Applications in the City: A Lightweight Infrastructure for Urban Computing
Abstract: In this paper, we describe an infrastructure for browsing and multimedia blogging of Web-based information anchored with physical places in an urban environment. The infrastructure is generic in the sense that it may use any means such as GPS, RFID or 2D-barcodes as ubiquitous links anchors to anchor Web-based information, blogs, and services in the physical environment. The infrastructure is inspired by earlier work on open hypermedia, in the sense that the anchoring and blogging functionality can be integrated to augment arbitrary Web-sites providing information that is relevant to places or objects in the physical world. The blog and anchor functionality is provided as a set of Web services running on a server external to the content server. Experiences and design issues from three cases using Semacode-based physical anchoring to support lightweight urban Web applications are discussed.
Praveen Lakkaraju, Susan Gauch and Mirco Speretta. Document Similarity Based on Concept Tree Distance
Abstract: The Web is quickly moving from the era of search engines to the era of discovery engines. Whereas search engines help you find information you are looking for, discovery engines help you find things that you never knew existed. A common discovery technique is to automatically identify and display objects similar to ones previously viewed by the user. Core to this approach is an accurate method to identify similar documents.  In this paper, we present a new approach to identifying similar documents based on a conceptual tree-similarity measure. We represent each document as a concept tree using the concept associations obtained from a classifier. Then, we make employ a tree-similarity measure based on a tree edit distance to compute similarities between concept trees. Experiments on documents from the CiteSeer collection showed that our algorithm performed significantly better than document similarity based on the traditional vector space model.
Michael A. Stefanone and Derek Lackaff. We're All Stars Now: Reality Television, Web 2.0, and Mediated Identities
Abstract: Social cognitive theory suggests a likely relationship between the rising popularity of both reality television and social networking sites. This research utilized a survey (N=456) of young adults to determine the extent to which reality television consumption explains user behavior in the context of social network sites. Results show a consistent positive relationship between reality television consumption and the length of time spent logged on to these sites, the size of users' networks, the proportion of friends not actually met face to face, and photo sharing frequency while controlling for age, gender and education. Other categories of television viewing like news, fiction, and educational programming were not related to users' online behavior.
Okan Kolak and Bill N. Schilit. Generating Links by Mining Quotations
Abstract: Scanning books, magazines, and newspapers has become an increasingly widespread
activity because people believe that much of the worlds' information still resides off-line.  In general, after these works are scanned, they are indexed for search and processed to add links.  This paper describes a new approach to automatically add links by mining popularly quoted passages.  Our technique connects elements that are semantically rich, so strong relations are made. Moreover, link targets point within a work rather than to the entire work, facilitating navigation.  This paper makes three contributions. We describe a scalable algorithm for mining repeated word sequences from extremely large text corpora.  Second, we present techniques that filter and rank the repeated sequences for uotations.  Third, we present a new user interface for navigating across and within works in the collection using quotation links.  Our system has been run on a digital library of over 1 million books and has been used by thousands of people.
Seamus Lawless, Lucy Hederman and Vincent Wade. Enhancing Access to Open Corpus Educational Content: Learning in the Wild
Abstract: The World Wide Web (WWW) provides access to a vast array of interconnected educational content on almost every subject imaginable. A great deal of this content is ideal for incorporation into personalised eLearning experiences. However the discovery, harvesting and incorporation of appropriate educational material's have proven to be complex and arduous tasks. Traditional educational hypertext systems are based upon the generation of links and anchors between content objects. However the dynamic incorporation of open corpus educational content in eLearning requires the generation of a relationship between educational concepts and the hypertext documents. One approach to create this overlay between concept and content is to use a Mindmap interface to allow learners to explore and associate hypertext content with knowledge maps of their own creation. This paper presents the Open Corpus Content Service (OCCS), a framework that uses the hypertext structure of the WWW to provide methods of educational content discovery and harvesting. The OCCS semantically examines linked content on both the WWW and in digital content repositories, and creates concept-specific caches of content. The paper also introduces U-CREATe, a novel user-driven Mindmap interface for supporting the exploration and assembly of content cached by the OCCS, in a pedagogically meaningful manner. The combination of these systems benefits both the educator and learner, empowering the learner through ownership of the educational experience and allowing the educator to focus on the pedagogical design of educational offerings rather than content authoring.
Xiaolin Shi, Matthew Bonner, Lada Adamic and Anna Gilbert. The Very Small World of the Well-Connected
Abstract: Online networks occupy an increasingly larger position in how we acquire information, how we communicate with one another, and how we disseminate information. Frequently, small sets of vertices dominate various graph and statistical properties of these networks and, because of this, they are relevant for structural analysis and efficient algorithms and engineering. For the web overall, and specifically for social linking in blogs and instant messaging, we provide a principled, rigorous study of the properties, the construction, and the utilization of subsets of special vertices in large online networks. We show that graph synopses defined by the importance of vertices provide small, relatively accurate portraits, independent of the importance measure, of the larger underlying graphs and of the important vertices. Furthermore, they can be computed relatively efficiently.
Hyun Chul Lee, Allan Borodin and Leslie Goldsmith. Extracting and Ranking Viral Communities through Semantic Similarity
Abstract: We study the community extraction problem within the context of networks of blogs and forums. When starting from a small set of known seed nodes, we argue that the use of semantic content information (beyond explicit link information) plays an essential role in the identification of the relevant community. Our approach lends itself to a new and insightful ranking scheme for members of the extracted community and an efficient algorithm for inflating/deflating the extracted community. Using a considerably large commercial data set of blog and forum sites, we provide experimental evidence to demonstrate the utility, efficiency, and stability of our methods.
Johannes Albertsen and Niels Olof Bouvin. User-defined Structural Searches in MediaWiki
Abstract: Wikipedia has been the poster child of user contributed content using the space of MediaWiki as the canvas on which to write.  While supremely suitable for authoring simple hypermedia documents, MediaWiki does not lend itself easily to let the author create dynamically assembled documents, or create pages that monitor other pages. While it is possible to create such ``special'' pages, it requires PHP coding and thus administrative rights to the MediaWiki server. We present in this paper work on a structural query language to allow users to add dynamically evaluated searches to ordinary wiki-pages.
Munmun De Choudhury, Hari Sundaram, Ajita John and Doree Seligmann. Can Blog Communication Dynamics be correlated with Stock Market Activity?
Abstract: In this paper, we develop a simple model to study and analyze communication dynamics in the blogosphere and use these dynamics to determine interesting correlations with stock market movement. This work can drive targeted advertising on the web as well as facilitate understanding community evolution in the blogosphere. We describe the communication dynamics by several simple contextual properties of communication, e.g. the number of posts, the number of comments, the length and response time of comments, strength of comments and the different information roles that can be acquired by people (early responders / late trailers, loyals / outliers). We study a "technology-savvy" community called Engadget (http://www.engadget.com). There are two key contributions in this paper: (a) we identify information roles and the contextual properties for four technology companies, and (b) we model them as a regression problem in a Support Vector Machine framework and train the model with stock movements of the companies. It is interestingly observed that the communication activity on the blogosphere has considerable correlations with stock market movement. These correlation measures are further cross-validated against two baseline methods. Our results are promising yielding about 78% accuracy in predicting the magnitude of movement and 87% for the direction of movement.
Munmun De Choudhury, Hari Sundaram, Ajita John and Doree Seligmann. Dynamic Prediction of Communication Flow Using Social Context
Abstract: In this paper, we develop a temporal representation framework for communication and social context to efficiently predict communication flow in social networks. The problem is important because it facilitates determining social and market trends as well as efficient information paths among people. We describe communication flow by two parameters: the intent to communicate and communication delay. There are three key contributions in this paper. (a) To estimate the intent and delay, we design features to characterize communication and social context. Communication context refers to the attributes of current communication. Social context refers to the patterns of participation in communication (information roles) and the degree of overlap of friends between two people (strength of ties).  (b) A subset of optimal features of the communication and social context is chosen at a given time instant using five different feature selection strategies. (c) The features are thereafter used in a Support Vector Regression framework to predict the intent to communicate and the delay between a pair of individuals. We have excellent results (~12% prediction error) on a real world dataset from the largest social networking site, www.myspace.com. We observe interestingly that while context can reasonably predict intent, delay seems to be more dependent on personal contextual changes and latent factors, e.g. "age" of information and presence of cliques among people.
Robert Jaeschke, Beate Krause, Andreas Hotho and Gerd Stumme. Logsonomy - Social Information Retrieval with Logdata
Abstract: Social Bookmarking systems constitute an established part of the Web 2.0. In such systems users describe bookmarks by keywords called tags. The structure behind these social systems, called folksonomies, can be viewed as a tripartite hypergraph of user, tag and resource nodes. This underlying network shows specific structural properties that explain its growth and the possibility of serendipitous exploration.
Today's search engines represent the gateway to retrieve information from the World Wide Web. Short queries typically consisting of two to three words describe a user's information need. In response to the displayed results of the search engine, users click on the links of the result page as they expect the answer to be of relevance.
This clickdata can be represented as a folksonomy in which queries are descriptions of clicked URLs. The resulting network structure, which we will term logsonomy is very ;similar to the one of folksonomies. In order to find out about its properties, we analyze the topological characteristics of the tripartite hypergraph of queries, users and bookmarks on a large snapshot of del.icio.us and on query logs of two large search engines. All of the three datasets show small world properties. The tagging behavior of users, which is explained by preferential attachment of the tags in social bookmark systems, is reflected in the distribution of single query words in search engines. We can conclude that the clicking behaviour of search engine users based on the displayed search results and the tagging behaviour of social bookmarking users is driven by similar dynamics.
Scott Bateman, Carl Gutwin and Miguel Nacenta. Seeing Things in the Clouds: The Effect of Visual Features on Tag Cloud Selections
Abstract: Tag clouds are a popular method for visualizing and linking socially-organized information on websites. Tag clouds represent variables of interest (such as popularity) in the visual appearance of the keywords themselves using text properties such as font size, weight, or colour. Although tag clouds are becoming common, there is still little information about which visual features of tags draw the attention of viewers. As tag clouds attempt to represent a wider range of variables with a wider range of visual properties, it becomes difficult to predict what will appear visually important to a viewer. To investigate this issue, we carried out an exploratory study that asked users to select tags from clouds that manipulated nine visual properties. Our results show that font size and font weight have stronger effects than intensity, number of characters, or tag area; but when several visual properties are manipulated at once, there is no one property that stands out above the others. This study adds to the understanding of how visual properties of text capture the attention of users, indicates general guidelines for designers of tag clouds, and provides a study paradigm and starting points for future studies. In addition, our findings may be applied more generally to the visual presentation of textual hyperlinks as a way to provide more information to web navigators.
Conor Gaffney, Declan Dagger and Vincent Wade. A State of the Art Survey of Soft Skill Simulation Authoring Tools
Abstract: Online simulations are a convenient and efficient means of delivering engaging educational experiences. Soft skill simulations are particularly immersive and have many advantages including their educational effectiveness. Typically these simulations combine hypertext and media files to create a realistic model of real world social situations. The key impediment to their mainstream adoption is the complexity involved in their authoring. This paper focuses on a state of the art survey of existing authoring tools used to compose online soft skill simulations. It gives a detailed account of three carefully selected composition tools and also identifies and describes the key requirements needed to author soft skill simulations. While there has been previous literature describing composition tools used to author simulations, none have focused specifically on the authoring of online soft skill simulations.
Dong Zhou, Mark Truran, Tim Brailsford, Helen Ashman and Amir Pourabdollah. LLAMA-B: Automatic Hyperlink Authoring in the Blogosphere
Abstract: Viewed collectively, the sum of all blog entries recorded to date (usually referred to as the blogosphere) represents a prodigiously rich collection of commentary and opinion, a dizzying mixture of fact and speculation, subjective opinion and objective data. This paper introduces an hypermedia authoring tool intended to simplify the process of navigating this chaotic environment. The tool works by adding additional hyperlinks to blogs, links which connect blog entries addressing similar topics. These hyperlinks are generated by an algorithm that uses statistical language modeling and graph based analysis to exploit the implicit associative structure of the blogosphere. An evaluative exercise, centred upon the unsupervised labeling of blog articles, confirms the effectiveness of this approach.
Ed H. Chi and Todd Mytkowicz. Understanding the Efficiency of Social Tagging Systems using Information Theory
Abstract: Given the rise in popularity of social tagging systems, it seems only natural to ask how efficient is the organically evolved tagging vocabulary in describing underlying document objects? Does this
distributed process really provide a way to circumnavigate the traditional "vocabulary problem" with ontologies? We analyze a social tagging site, namely del.icio.us, with information theory in
order to evaluate the efficiency of this social tagging site for encoding navigation paths to information sources. We show that entropy analysis from information theory provides a natural and interesting way to understand the descriptive encoding power of tags, which appears to be waning. We discuss the implications of our findings and provide insight into how our methods can be used to design more usable social tagging software.
Martin Szomszor, Ivan Cantador and Harith Alani. Correlating User Profiles From Multiple Folksonomies
Abstract: As the popularity of the web increases, particularly the use of social networking sites and Web2.0 style sharing platforms, users are becoming increasingly connected, sharing more and more information, resources, and opinions. This vast array of information presents unique opportunities to harvest knowledge about user activities and interests through the exploitation of large-scale, complex systems. Communal tagging sites, and their respective folksonomies, are one example of such a complex system, providing huge amounts information about users, spanning multiple domains of interest. However, the current Web infrastructure provides no mechanism for users to consolidate and exploit this information since it is spread over many disparate and unconnected resources. In this paper we compare user tag-clouds from multiple folksonomies to: (a) show how they tend to overlap, regardless of the focus of the folksonomy (b) demonstrate how this comparison helps in finding and aligning the user's separate identities, and (c) show that cross-linking distributed user tag-clouds enriches users profiles. During this process, we find that significant user interests are often reflected in multiple \webtwo profiles, even though they may operate over different domains. However, due to the free-form nature of tagging, some correlations are lost, a problem we address through the implementation and evaluation of a user tag filtering architecture.
Ben Markines, Heather Roinestad and Filippo Menczer. Efficient Assembly of Social Semantic Networks
Abstract: Social bookmarking has allowed Web users to actively markup individual Web resources through annotating. Currently, researchers are exploring the use of these markups to create implicit links between online resources. We define an implicit link as a relationship between two online resources established by the Web community. An individual may create or reinforce a relationship between two resources by applying a single tag or organizing them in a common folder. This has led researchers to explore techniques for building networks of resources, categories, and people using the annotations provided by members of social bookmarking systems. On the other hand, in order for these techniques to move from the lab to the real world, efficient building and maintenance of these potentially large networks remains a major obstacle. Methods for assembling and indexing these large networks will allow researchers to run more rigorous assessments of their proposed techniques. Toward this we explore an approach taken from the sparse matrix literature and apply it to our system GiveALink. We also investigate distributing the assembly, allowing us to grow the network as the body of resources, annotations, and users grow. Dividing the network is effective for assembling a global network where the implicit links are dependent on global properties. Additionally, we explore a network assembly technique removing the global dependency by introducing an incremental approach where each participant independently contributes to the global network. Finally, we apply and evaluate two measures, mutual information and cosine similarity, to an alternative view of our underlying data.
Adam Jatowt, Yukiko Kawai, Hiroaki Ohshima and Katsumi Tanaka. What Can History Tell Us? Towards Different Models of Interaction with Document Histories
Abstract: The current Web is a transitive collection where little effort is made for versioning and enabling access to historical data for users. As a consequence, users generally do not have enough temporal support when browsing the Web. However, we think that there are many benefits from integrating documents with their histories. For example, document' history can help to enable a time travel and to judge the trustworthiness of the document. In this paper we discuss the possible types of interaction that users could have with document histories and demonstrate several example systems that we have implemented for utilizing this historical data. To support our view, we present the results of an online survey conducted with an objective of investigating user needs for temporal support on the Web. The results indicate quite low usage of Web archives among users yet, at the same time, they emphasize considerable users' interest in page histories.
Justin Donaldson, Michael Conover, Ben Markines, Heather Roinestad and Filippo Menczer. Network Visualization for Exploratory Search
Abstract: In this paper, we evaluate the utility of network-based visualizations for facilitating the exploration of a knowledge space, specifically the Web. A force directed network interface was developed to visualize the result sets provided by the social bookmarking application GiveALink. A user study was administered in order to evaluate the effectiveness of our interface for the purpose of exploration. Our study directly compares three approaches for presenting result sets: A conventional ranked result list, a two dimensional network visualization (map), and a hybrid interface combining both list and map aspects. Our results show that users preferred the hybrid interface, and were able to find the same amount of relevant information using fewer queries. We conclude that this behavior is a direct result of the additional structural information present in the network visualization, aiding them in the exploration of the information space.
Franca Garzotto, Davide Bolchini and Paolo Paolini. Investigating Success Factors for Hypermedia Development Tools
Abstract: What are the key factors that contribute to the "success" of a hypermedia development tool? We have investigated this issue in the context of non ICT professional environments (e.g., schools or small museums), which have limited or null "in-house" technical competences and must cope with very limited budget. The paper discusses a set of success factors that can be relevant for hypermedia tools devoted to the above target, and presents a tool for multichannel hypermedia development that we designed in our lab having these factors in mind. We discuss the success of our system and report a wide empirical study in which the different success factors have been measured.
Scott Golder. Measuring Social Networks with Digital Photograph Collections
Abstract: The ease and lack of cost associated with taking digital photographs have allowed people to amass large personal photograph collections. These collections contain valuable information about their owners' social relationships. This paper is a preliminary investigation into how digital photo collections can provide useful data for the study of social networks. Results from an analysis of 23 subjects' photo collections demonstrate the feasibility of this approach and the relationship between perceived closeness and network position, as well as future questions, are discussed.
Anupriya Ankolekar and Denny Vrandecic. Kalpana -- Enabling Client-Side Web Personalization
Abstract: A growing number of websites are recognizing the value of personalization based on a user's context and social network. As more websites become personalized, the resulting experience for users can be rather fragmented. We aim to facilitate a seamless Web personalization experience across websites by enabling personalization to take place at the client and thus allowing personal information about people to reside locally with people. If websites are to script a personalization experience that draws on information held by the user, it is imperative that this information be easily comprehensible by heterogeneous websites. In this paper, we demonstrate how Semantic Web technologies can be used to realize a vision of client-side Web personalization. The contribution of this paper is an architecture that demonstrates the feasibility of our approach and a prototype implementation that establishes its viability.
Rosta Farzan and Peter Brusilovsky. Where did the Researchers Go? Supporting Social Navigation at a Large Academic Conference
Abstract: Dealing with the information overload is an important challenge. Over the last decade researchers have tried to tackle that problem using social technologies. We present a social information access system that helps researchers attending a large academic conference to plan talks they wish to attend. More specifically, we have tried to address the problem of collecting reliable feedback from the community of users. Following "do it for yourself" approach, the system encourages users to add interesting talks to their individual schedules and uses scheduling information for social navigation support. We also report results of evaluation of the system at the ELearn 2007 conference.