Intelligent Web Servers as as Approach to the Web Indexing Problem

James Mayfield and Charles Nicholas

Computer Science Department

University of Maryland Baltimore County (UMBC)

Like the rest of the Web community, we have watched the Web explode in size and popularity over the last several months. We have also seen the emergence of robot-based indexing tools, such as Lycos and the World-Wide Web Worm. In our research, funded in part by the US Department of Defense, we are designing what we call "Intelligent Web Servers". These intelligent web servers will address some of the problems of scalability and currency that adversely affect the robot-based approaches.

The Web servers we have in mind are intelligent in the following sense: (1) they possess metadata, i.e. knowledge about the information they can provide, and (2) they can communicate with each other in order to better handle user requests.

With respect to the metadata the servers will have, we are using two approaches, both grounded in our earlier work in this area. In the design and construction of the SNITCH system we devoted considerable effort to the use of semantic nets as a means for constructing hypertext links, and obviously for such automatic linking to be effective, this semantic net must be descriptive of the underlying corpus.

In other work related to metadata, we use statistical properties of the text itself, namely the distribution of n-grams, again to construct hypertext links. The TELLTALE system is based on the assumption that documents with similar n-gram profiles are indeed similar in content. We are now involved in work to upgrade the TELLTALE in terms of performance and capacity, and adapting TELLTALE to the WWW is an important aspect of this work.

In addition to having metadata, our intelligent web servers will be able to communicate with each other. Right now, Web servers don't cooperate with each other in any meaningful sense, with the possible exception of proxy servers and caching. We propose that if, for example, a Web server is asked for an HTML file that it doesn't have, it asks its peer servers if they happen to have the requested file. This cooperation will, for example, help shield outside users from the possible ill-effects of migration of files from one server to another server on the same LAN.

These intelligent web servers will communicate with each other using KQML, which stands for Knowledge Query Manipulation Language. The syntax of KQML is well-defined, and a few more-or-less stable implementations are available. Development of the syntax and semantics of KQML continues at UMBC and a number of other places.

At the moment, our work is very much in progress. We have only recently started to incorporate KQML functionality into a Web server. As yet we have no experimental evidence to offer. However, once we have a small community of these intelligent Web servers, we plan to endow each of these with some data (and metadata) to manage. We will then run experiments on how well these servers deal with (1) information requests that involve possibly outdated or otherwise broken URLs, and (2) information requests of a more nebulous nature, such as "Show me more HTML documents that resemble document X."


Charles Nicholas          Computer Science Department, UMBC
410-455-2594, -3969 fax   http://ruff.cs.umbc.edu:1080/