Peer-to-Peer Client-Side Web Mining

Project Summary
Peer-to-peer (P2P) systems such as Gnutella, Napster, e-Mule, Kazaa, and Freenet are increasingly becoming popular for many applications that go beyond downloading music without paying for it. Examples include P2P systems for network storage, web caching, searching and indexing of relevant documents and distributed network-threat analysis. Novel data integration applications such as P2P web mining from the data stored in the browser cache of different machines connected via a peer-to-peer network may revolutionize the business of Internet search engines. A peer-to-peer data clustering algorithm that groups the URL-s visited by each user (with due privacy-protection) corresponding to different subjects by exchanging information with other peers can be very useful for discovering web-usage patterns of users and efficient web-search. This may help characterizing each user based on their browsing pattern, and forming communities of peers with similar interests. There can be many other similar interesting information integration and knowledge discovery applications involving data distributed in a P2P network. Data analysis plays an important role in most non-trivial information integration and retrieval applications. However, most of the off-the-shelf data analysis/mining techniques are designed for centralized applications where all the data are stored in a single central place. They do not work in a highly decentralized, distributed environment like a P2P network. We need distributed data mining algorithms that are fundamentally decentralized, asynchronous, communication efficient, and scalable. This research is developing a novel P2P web mining system. It is developing distributed algorithms for an early prototype of a web-browser plug-in to support P2P information retrieval and data analysis.
 

Publications and Products

Please visit out publications page....


Other Resources

Distributed Data Mining Bibliography (www.cs.umbc.edu/~hillol/DDMBIB)