Collaborative Research: Privacy-Sensitive Data
Mining from Multi-Party Distributed Data
Project Award
Number
National Science
Foundation Grant IIS-0329143
Collaborative award (WSU) IIS-0350533
Principal
Investigator
Hillol
Kargupta
Department of Computer Science and Electrical Engineering
University of Maryland, Baltimore County
1000 Hilltop Circle,
Baltimore, MD 21250
Voice: (410) 455-3972, Fax : (410) 455-3969, Email: hillol@cs.umbc.edu
PI URL: http://www.cs.umbc.edu/~hillol
Co-PI
Krishnamoorthy Sivakumar
School of Electrical Engineering and
Computer Science
Washington State University
Pullman, WA 99164-2752
Voice: (509) 335-4969, Fax: (509) 335-3818,
Email: siva@eecs.wsu.edu
http://www.eecs.wsu.edu/~siva/
Students
UMBC: Kun Liu, Souptik
Dutta.
WSU: Da Meng, Jianjie
Ma, Wang Qi, Taikun Cheng
Post Docs
UMBC: Ran Wolff, Chris
Giannella
Keyword
Privacy
preserving distributed data
mining
Project Summary
A growing number of
data mining applications need to deal with data sources that are distributed,
possibly proprietary, and sensitive to privacy. Financial transactions,
health-care records, and network communication traffic are a few examples.
Privacy is also becoming an increasingly important issue in data mining
applications for counter-terrorism and homeland defense that may require
creating profiles, constructing social network models, detecting terrorist
communications from distributed privacy sensitive multi-party data. Combining
such diverse data sets belonging to different parties may violate the privacy
laws. Therefore we need algorithms that can mine the data in a distributed
fashion while guaranteeing that the privacy of the data is not
compromised.
This has resulted in the development of several
privacy-preserving data mining techniques. Many of these techniques work using
randomized techniques to perturb the data and preserve the data privacy while
still guaranteeing the invariance of the underlying patterns. This research
pointed out that the popular naive randomization of the data (more specifically,
additive random matrices) may preserve little data privacy in many cases. It
proposes a framework to understand these data masking techniques using the
theory of random matrices to shows the problems of some existing
privacy-preserving data mining techniques and potential research directions for
solving the problems. Current research effort includes exploration of Shanon’s coding theory for the development of a systematic
framework for designing privacy-preserving data mining algorithms.
This proposal also suggests the development of a collection of
randomized techniques that are provably correct in supporting privacy-sensitive
applications. It proposes a randomized projection-based technique to compute
statistical aggregates from multi-party distributed data. It also suggests
research on a privacy-sensitive distributed Bayesian network learning algorithm
for similar applications. The proposed algorithms will be developed, tested, and
evaluated using a collection of testbeds that PIs have
been developing for the last several years. This research will be a collaborative effort between
University of Maryland Baltimore County and
Washington State University. Professor Kargupta will take the lead on the
random matrices and random projection research. Professor Sivakumar will take the lead on the privacy-sensitive
Bayesian network-based distributed data mining.
Publications and
Products
- K. Das, K. Bhaduri, K. Liu, H. Kargupta. (2008). Distributed
Identification of Top-l Inner Product Elements and its Application in
a Peer-to-Peer Network. IEEE Transactions on Knowledge and Data Engineering.
Volume 19, Number 3.
- K. Das, H. Kargupta, K. Liu, (2008). A Game Theoretic Framework for
Practical Privacy Preserving Distributed Data Mining. (In communication)
- H. Kargupta, K. Das, and K. Liu. (2007). A Game Theoretic Approach toward
Multi-Party Privacy-Preserving Distributed Data Mining. In the 11th European
Conference on Principles and Practice of Knowledge Discovery in Databases,
Warsaw, Poland, September 2007.
- K. Liu, C. Giannella, H. Kargupta. (2007). On the Privacy of Euclidean
Distance Preserving Data Perturbation. (In communication).
- K. Liu, K Bhaduri, K. Das, P. Nguyen, H. Kargupta (2006). Client-side Web
Mining for Community Formation in Peer-to-Peer Environments. SIGKDD
Explorations. Volume 8, Issue 2, Pages 11-20.
- K. Liu, H. Kargupta, and J. Ryan. (2006) Random projection-based
multiplicative data perturbation for privacy preserving distributed data
mining. IEEE Transactions on Knowledge and Data Engineering (TKDE), volume 18,
number 1, pp. 92-106, January 2006.
- K. Liu, C. Giannella, and H. Kargupta. (2006). An attacker's view of
distance preserving maps for privacy preserving data mining. In Proceedings of
the 10th European Conference on Principles and Practice of Knowledge Discovery
in Databases (PKDD'06), Berlin, Germany, September 2006.
- J. da Silva, C. Giannella, R. Bhargava, H. Kargupta, and M. Klusch. (2005)
Distributed Data Mining and Agents, Invited submission, Engineering
Applications of Artificial Intelligence Journal. volume 18, pp. 791--807.
- H. Kargupta, K. Liu, S. Datta, J. Ryan, K. Sivakumar(2003) Link Analysis,
Privacy Preservation, and Random Perturbations. Proceedings of
the KDD Workshop on Link Analysis for Detecting Complex Behavior
(LinkKDD’03), Washington D.C., July 2003.
- H. Kargupta, K. Liu, S. Datta, J. Ryan, and K. Sivakumar. (2003). Homeland
security and privacy sensitive data mining from multi-party distributed
resources. In Proceedings of the 12th IEEE International Conference on Fuzzy
Systems, volume 2, pp. 1257-1260, St. Louis, MO, May 2003.
- H. Kargupta, H. Dutta, S. Datta , K. Sivakumar (2003) Privacy Preserving
Data Mining and Random Perturbation. Proceedings of the Workshop on Privacy in
the Electronic Society (WPES'03), Washington DC,October,2003.
- S. Datta, H. Kargupta and K. Sivakumar. (2003) .Homeland Defense,
Privacy-Sensitive Data Mining, and Random Value Distortion. Proceedings of the
SIAM Workshop on Data Mining for Counter Terrorism and Security
(SDM'03),San Francisco, CA, May, 2003.
- D. Meng, K. Sivakumar, and H. Kargupta. (2004). Privacy Sensitive Bayesian
Network Parameter Learning. Proceedings of the Fourth IEEE International
Conference on Data Mining. Brighton, UK, pages 427--430.
- H. Kargupta, S. Datta, Q. Wang, and K. Sivakumar. (2004). Random Data
Perturbation Techniques and Privacy Preserving Data Mining. Knowledge and
Information Systems Journal, volume 7, number 4, pages 387--414.
- H. Kargupta, S. Datta, Q. Wang, and K. Sivakumar. (2003). On the Privacy
Preserving Properties of Random Data Perturbation Techniques. Proceedings of
the IEEE International Conference on Data Mining. Melbourne, Florida, USA,
pages 99--106. (Winner of 2003 IEEE Data Mining Conference Best Paper Award.)
- J. Ma and K. Sivakumar, A Framework of Privacy-Preserving Bayesian Network
Parameter Learning using Post Randomization, In WSEAS Transaction of
Information Technologies, January, 2006.
- J. Ma and K. Sivakumar, Privacy-preserving Bayesian Network Learning from
Heterogeneously Distributed data, In the proceeding of the 2nd International
conference on Data Mining, Las Vegas, NV, (DMIN06), 2006.
- H. Kargupta and K. Sivakumar, (2004) Existential Pleasures of Distributed
Data Mining. Data Mining: Next Generation Challenges and Future Directions.
Editors: H. Kargupta, A. Joshi, K. Sivakumar, and Y. Yesha. AAAI/MIT Press.
- J. Ma, Learning from perturbed data for privacy-preserving data mining,
PhD Thesis, Washington State University, 2006.
- H. Kargupta, K. Liu , and J. Ryan. (2003). Privacy Sensitive Distributed
Data Mining from Multi-party Data. Proceedings of Intelligence and Security
Informatics, First NSF/NIJ Symposium 2003:336-342, Tucson, AZ. K. Liu, H.
Kargupta, and J. Ryan. (2003).
- Random projection and privacy preserving correlation computation from
distributed data. Technical Report TR-CS-03-24, Computer Science and
Electrical Engineering Department, University of Maryland, Baltimore County.
In communication.
Some of the above publications can be downloaded from this link....