[an error occurred while processing this directive]

Agent-based Information Retrieval

Agents are a very promising technology for information retrieval. Some applications are intelligent IR interfaces, mediated searching and brokering, and clustering and categorization. An agent-based approach means that IR systems can be more scalable, flexible, extensible, and interoperable, using agents that route information, broker requests, and share metadata.

This page is maintained by Ian Soboroff [email], and was last updated on 17 March 1998. This page is ever changing; here is an older version, which was less focused on agents for IR. Please contact me with ideas, comments, corrections, or additions!


Introductory material


Agents for Organizing Information

A significant application of agents in IR is organization. Organized information spaces are easier to search (one hopes!), but finding good organizations, as well as sorting out possibly huge amounts of data, is a nontrivial task. If agents are to organize information for us, this assumes that agents can first understand the information, and then can structure it in a fashion useful to a user.

Oren Etzioni and Mike Perkowitz have a paper in IJCAI '97 entitled Adaptive Web Sites: an AI Challenge. In it, they propose that web servers will analyze user request patterns, and use this data to dynamically restructure their pages.

Sue Bogar and Ian Soboroff worked on using agents to understand specific types of web pages (i.e., university course descriptions), mark them up with semantic information using SHOE (Simple HTML Ontology Extensions), and draw inferences over the collection for interesting search problems, such as student academic planning.

Daniela Rus has several papers on using agents for document structuring and information access.

The Natural Language Processing lab at the University of Massachusettes (part of the Center for Intelligent Information Retrieval) conducts research into extracting information from unrestricted text documents.

The Aristotle project at Iowa State is the source of several projects such as Phoaks. The interest of the Aristotle project is developing systems to automatically categorize Web resources.


Mediated searching

Many, many good search tools exist for large information spaces, such as bibliographies, email addresses, and Web pages. This has created another opportunity for agents: meta-searching, or mediated or facilitated searching. Agents can discover and learn about existing information sources, and provide a uniform interface for general queries. These agents can broker requests to the most appropriate resource, which may not even be known to the user.

The Intelligent Software Agents group at Carnegie Mellon have two projects regarding information retrieval. DVINA is an agent for monitoring the web, USENET, or email using both statistical and knowledge-based techniques; as of this update, few details are available. WebMate is a personal Web agent integrating parallel search techniques, relevance feedback, similarity-fetching, offline browsing, etc.

The Softbots group at the University of Washington has been the source of several projects on mediating Internet searches, such as Ahoy and Metacrawler.

The Nobots group at the Stanford Robotics Laboratory as subsumed the Nobotics group. They are interested in adaptive IR agents, among other things, and have a collection of publications.

The local bibliography contains many references on collaborative tools for searching.

The KQML agent communication language provides semantics for mediated queries between agents.


Metadata for IR Agents

Agents need some way to process and "understand" their information, both on the level of individual documents/objects as well as collection-wide. Several different techniques exist for deriving metadata from information.

Statistical approaches, such as n-grams and latent semantic indexing are particularly interesting for analyzing text objects, because they are independent of the language of the text, are resistant to noise (i.e., mispellings), and allow the application of many known mathematical techniques to natural language analysis.

The SHOE project (Simple HTML Ontology Extensions) at UMCP defines a set of HTML tags which can be used to embed semantic markup in web pages. Using SHOE, a personal web page could define the author as a graduate student in Computer Science at Wossamatta University, working with Prof. J. Q. Hacker, all in a software-readable fashion. Agents can then read these tags as well as an ontology definition, and compile a knowledge base on web pages.

AgentSoft has an XML demo to show how one could use XML in developing semantic markup for querying web resources such as push channels.


Formal Models of Agent-Based IR

T.W.C. Huibers, B. van Linder, and J.-J.Ch. Meyer at Utrecht University, Netherlands, propose a preliminary formalisation of "An Agent-Oriented Approach to Information Retrieval." An abstract is available.


Intelligent Interfaces for IR

A very active area of agent research is in intelligent user interfaces. Rather than being static, an adaptive user interface will not only change as it learns about its users, but will also act as an agent for the user.

The Software Agents group of the MIT Media Laboratory has conducted several projects in constructing agents and collaborative tools for information filtering and discovery.

Marko Balabanovic at Stanford has written the FAB system, which presents a user with possibly interesting web pages, which the user evaluates. The evaluations are used to try to perform better searches.

Alexa is a web-page recommender service, where users rate pages they see. Alexa provides a compact interface which shows recommended "next stops" on the web based on your current page.


Digital Libraries

The Digital Library Initiative is a four-year university and industry project supported by NSF, DARPA, and NASA. Agents may be a big part of a digital library: intelligent assistants in browsing, automated cataloguing and organization, and information discovery agents are just a few ideas.

Project centers at Stanford University and The University of Michigan are using agents to construct intelligent interfaces, facilitate high-level searching, organize collections, and sophisticated searching over specific collection classes.

Site listings from UMich:


Bibliography

A local bibliography contains many references on ABIR and realted topics. [an error occurred while processing this directive]