2004 IPDPS Conference
18th International Parallel and Distributed Processing Symposium
April 26–April 30, Santa Fe, New Mexico
Eldorado Hotel
TUTORIAL TITLE: An
Introduction to Distributed Data Mining
PRESENTER:
Hillol Kargupta,
PhD
Associate
Professor
Computer
Science and Electrical Engineering Department
University
of Maryland Baltimore County
e-mail: hillol AT cs
DOT umbc DOT edu
http://www.cs.umbc.edu/~hillol
President, AGNIK, LLC.
1450
S Rolling Road,
e-mail:
Hillol AT agnik DOT com
OVERVIEW
Advances in
computing and communication over wired and wireless networks have resulted in
many pervasive distributed computing environments. The Internet, intranets,
local area networks, ad hoc wireless networks, and sensor networks are some
examples. These environments often come with different distributed sources of
data and computation. Mining in such environments naturally calls for proper
utilization of these distributed resources. Moreover, in many
privacy sensitive applications different, possibly multi-party, data sets
collected at different sites must be processed in a distributed fashion without
collecting everything to a single central site. However, most off-the-shelf
data mining systems are designed to work as a monolithic centralized
application. They normally down-load the relevant data to a centralized
location and then perform the data mining operations. This centralized approach
does not work well in many of the emerging distributed, ubiquitous, possibly
privacy-sensitive data mining applications.
The field
of Distributed Data Mining (DDM) offers an alternate choice. It pays careful
attention to the distributed resources of data, computing, communication, and
human factors in order to use them in a near optimal fashion. This tutorial
will offer an introduction to the emerging field of Distributed Data Mining.
The attendees will be exposed to the following aspects of this field:
1) An overview of the emerging DDM
applications
2) An overview of the existing DDM
algorithms
3) More detailed discussion of some important
DDM algorithms
4) An overview of the systems research
issues in DDM
5) Detailed case study of an existing
DDM system and hands on demonstration
6) Future directions
7) Pointers to more advanced material
and resources
TUTORIAL
OUTLINE
1. Distributed data mining (DDM) in a
ubiquitous environment: An overview
a) Motivation (5mins)
b) Some of the Emerging Applications: (10mins)
i) Large-scale distributed grid-based
applications
ii) Wireless applications
iii) Privacy-preserving applications
c) Challenges: (5mins)
i) Algorithmic issues
ii)
Systems issues
iii) Communication issues
iv) Security issues.
2. Algorithms and architectures: (1hr
30mins)
a)
Distributed
data mining algorithms:
i)
Computing
statistical aggregates in a distributed manner
ii)
Distributed
principal component analysis
iii)
Distributed
clustering
iv)
Distributed
Bayesian algorithms
v)
Distributed
classifier/predictive-model learning: Decision tree learning, multi-variate regression
b)
Architectures:
(20mins)
i)
Distributed
and monolithic architectures.
ii)
Multi-agent-based
architectures.
1. Communication languages for DDM
applications. (5mins)
2. Human-computer interaction issues. (10mins)
3. Applications: Case study of a
distributed vehicle fleet mining system. (15mins)
4. Conclusions (5mins)
Since the
field is very new and hardly any commercial/academic system is available, the
presenter may have to use some of the systems generated by the research from
his group for demonstrating different aspects of this technology.