Session 10: (Friday pm) Links for a Better Web
33 Refinement of TF-IDF Schemes for Web Pages using their Hyperlinked Neighboring Pages Kazunari Sugiyama, Kenji Hatano, Masatoshi Yoshikawa, Shunsuke Uemura

Evaluation, Hypertext Structure, Link Analysis, Search Engines, World Wide Web, WWW, Information retrieval, TF-IDF scheme

In IR (Information Retrieval) systems based on the vector space model, the tf-idf scheme is widely used to characterize documents. However, in the case of documents with hyperlink structures such as Web pages, we believe that a technique for representing the contents of Web pages more accurately is required by exploiting the contents of their hyperlinked neighboring pages. In this paper, we first propose three methods for refining the tf-idf scheme for a target Web page by using the contents of its hyperlinked neighboring pages, and then compare retrieval accuracy of our proposed methods. Experimental results show that, generally, more accurate feature vectors of a target Web page can be generated in the case of utilizing the contents of its hyperlinked neighboring pages at levels up to second in the backward direction from the target page.

34 Enhanced Web Document Summarization Using Hyperlinks Jean-Yves Delort, Bernadette Bouchon-Meunier, Maria Rifqi

Data Mining, Hypertext Structure, World Wide Web, Evaluation, Link Analysis

This paper addresses the issue of Web document summarization. As textual content of Web documents is often scarce or irrelevant and existing summarization techniques are based on it, many Web pages and websites cannot be suitably summarized. We consider the context of a Web document by the textual content of all the documents linking to it. To summarize a target Web document, a context-based summarizer has to perform a preprocessing task, during which it will be decided which pieces of information in the source documents are relevant to the content of the target. Then a context-based summarizer faces two issues: first, the selected elements may partially deal with the topic of the target, second they may be related to the target and yet not contain any cues about the content of the target. ewline In this paper we put forward two new summarization by context algorithms. The first one uses both the content and the context of the document and the second one is based only on the elements of the context. It is shown that summaries taking into account the context are usually much more relevant than those made only from the content of the target document. Optimal conditions of the proposed algorithms with respect to the sizes of the content and the context of the document to summarize are studied.

35 Link analysis for collobrative knowledge building Harris Wu, Michael D. Gordon, Kurt Demaggd

Short paper: Data Mining, Link Analysis, Link Analysis, Navigation, Dynamic Linking

We present a research project utilizing navigation and hyperlink data to aid collaborative knowledge building. We allow collaborators to personally organize documents and other research resources and make references to them. We combine their personal organizations and references to develop a unified, hierarchical categorization of these resources. We analyze collaborators’ navigations to identify prominent research activities as well as the key documents related to these activities. We examine prominence over time to identify research trends.

36 “Common” Web Paths in a Group Adaptive System Maria Barra, Delfina Malandrino, Vittorio Scarano

Short paper: Adaptive Hypermedia, Link Analysis, Recommendation Systems

In this paper we describe the how we use users’ accesses and interactions on pages to discover and recommend relevant Common Paths to a group of users. We collect data using a social navigation environment GAS (Group Adaptive System) we developed [1] and we are currently integrating the “Common Path” navigation tool in the user interface. The goal is to use the Common Path of a subset of users in the system as a recommendation to a user (not in the subset)