Introduction
Advances in computing and communication over wired and wireless networks have
resulted in many pervasive distributed computing environments. The Internet,
intranets, local area networks, ad hoc wireless networks, and sensor networks
are some examples. These environments often come with different distributed
sources of data and computation. Mining in such
environments naturally calls for proper utilization of these distributed
resources. Moreover, in many privacy sensitive applications different,
possibly multi-party, data sets collected at different sites must be
processed in a distributed fashion without collecting everything to a
single central site. However, most off-the-shelf data mining systems are
designed to work as a monolithic centralized application. They normally
down-load the relevant data to a centralized location and then perform the
data mining operations. This centralized approach does not work well in
many of the emerging distributed, ubiquitous, possibly privacy-sensitive
data mining applications.
Distributed Data Mining (DDM) offers an alternate approach to address this
problem of mining data using distributed resources. DDM pays careful attention to
the distributed resources of data, computing, communication, and human
factors in order to use them in a near optimal fashion.
DDM applications come in different flavors. When the data can be freely and
efficiently transported from one node to another without significant overhead,
DDM algorithms may offer better scalability and response time by (1) properly
redistributing the data in different partitions or (2) distributing the
computation, or (3) a combination of both. These algorithms often rely on
fast communication between participating nodes. However, when the data
sources are distributed and cannot be transmitted freely over the network
due to privacy-constraints or bandwidth limitation or scalability problems,
DDM algorithms work by avoiding or minimizing communication of the raw data.
The distributed data mining research of the DIADIC
Laboratory, University of Maryland at
Baltimore County is directed toward the development of a new technology. Our
current research is focussed on the following areas:
DDM in mobile and wireless environments
DDM for large scale scientific, business, and grid applications
Privacy preserving DDM from multi-party data
Our research considers both DDM algorithm and system development. Some of our
current and past research contributions are listed below:
Algorithmic Research
Distributed Principal Component Analysis (PCA) from Heterogeneous Data
Distributed PCA-based Clustering from Heterogeneous Data
Distributed multi-variate regression
Distributed Bayesian Network Learning
Distributed Hierarchical Clustering
Distributed classifier learning
Distributed privacy-preserving data mining from multi-party data using additive and multiplicative noise
Using fourier spectrum of decision trees for aggregating decision tree ensembles
Distributed representation construction in the biological process of gene expression and
its algorithmic understanding
Power consumption characteristics of distributed data mining algorithms and developing
data mining algorithms that minimize power consumption
System Development and Applications
PArallel Data Mining Agents (PADMA) System (1997) (a parallel data mining system)
BODHI (1998) System (an early general purpose DDM system)
Mobile Data Mining (MobiMine) System (for monitoring static data streams using mobile devices)
VEhcile DAta Stream mining (VEDAS) System (2002) (for monitoring mobile data sensors in vehicles for distributed fleet management applications)
Publications regarding our above listed work can be found here....