Gene
Expression: The Missing Link of Evolutionary Computation, from Theory
to Applications in Distributed Data Mining
Hillol Kargupta
Assistant Professor, Department
of Computer Science and Electrical Engineering
University of Maryland Baltimore
County, Baltimore, MD 21250
Voice: (410) 455-3972, Fax : (410)
455-3969
Email: hillol@cs.umbc.edu
PI URL: http://www.cs.umbc.edu/~hillol
Project URL:
http://www.cs.umbc.edu/~hillol/GEXP/gene_expression.html
Link
to a related project on Genetic Codes (http://www.cs.umbc.edu/~hillol/GEXP/genetic_code_project.html).
Project Award Information
Award Number: National Science
Foundation Grant IIS-0196401
Duration: September 15, 1998-2003
Title: Gene Expression: The missing
Link of Evolutionary Computation,
from Theory to Applications in Distributed
Data Mining.
Keywords: Gene
expression, evolutionary computation, function induction, distributed data
mining
Project Summary
Construction of the phenotype from the information coded in the genome is often
called the gene expression process in biology. Transformation of the DNA to the
protein structure plays an important role in this process. Since the structure of
the protein determines the efficacy of the DNA, gene expression can be viewed as
the process of computing the mapping of the genotype to the phenotype. For
some reason, it does not directly compute this function using the DNA itself. It
first transforms the DNA representation to mRNA and subsequently to protein before
evaluating its phenotype. Representation transformations play an important role in
efficient problem solving in many fields such as physics, mathematics,
engineering, machine learning, and optimization. Therefore representation
transformations in gene expression allude intriguing possibilities. However, very
little is understood about the role of gene expression in evolutionary search.
The primary objective of this project is
to understand the representation transformations in gene expression (DNA-->
mRNA-->Protein) from the perspective of search and function induction from
observed data. The project will also foundations for developing gene
expression-based genetic algorithms for scalable optimization, learning, and
distributed data mining.
This research has produced a new randomized algorithm to detect the
decomposed representation of an objective function with n variables. This
algorithm can be effectively used for detecting the search space structure in
large scale optimization problems. Fourier analysis of genetic code-like
transformations provided an intriguing result. It showed that such transformations
can construct a Fourier representation of functions with only a polynomial number
of terms that are exponentially more significant than the rest. This is a
very important property needed for efficient induction of function representation
from data which is a fundamental problem in inductive learning, search, and
optimization. Exploration of randomized genetic code-like transformation has
produced deep insights about the effect of such transformations on efficient
and approximate solving of non-linear search and optimization problems using
linear techniques.
Publications and
Products
Journals
- H. Kargupta, R. Ayyagari, and S. Ghosh. (2003).
Learning Functions Using Randomized Expansions: Probabilistic
Properties and Experimentations. IEEE Transactions on Knowledge and Data Engineering, Volume 16,
Number 8, pages 894-908.
(pdf) (A paper on probablistic properties of randomized genetic
code-like transformations)
-
Kargupta, H. and Park, B. (2001).Gene Expression and Fast Construction of Distributed Evolutionary Representation. Journal of Evolutionary Computation. Vol. 9, no.1, pages 1--32.
-
Kargupta, H. (2000). A
Striking Property of Genetic Code-like Transformations. Complex System
Journal. Vol. 13, no. 1, pages 1--32.
(pdf
format)
-
Kargupta, H. (1999). SEARCH, Computational Processes in Evolution,and Preliminary
Development of the Gene Expression Messy Genetic Algorithm. Journal of
Complex Systems. Volume 11, issue 4, pp 233-287.
-
Kargupta, H. (1999). Scalable Evolutionary
Computation. Journal of Evolutionary Computation. vol 7. no 4. iii--iv.
MIT Press.
-
Kargupta, H. and Bandyopadhyay, S. (1998).
A Perspective On The Foundation And Evolution Of The Linkage Learning Genetic
Algorithms. Accepted in the Special Issue on Genetic Algorithms, the Journal
of Computer Methods in Applied Mechanics and Engineering (CMAME). Guest
Eds: David E. Goldberg and Kalyanmoy Deb.
Conferences
-
Kargupta, H. (2001). Learning through Genetic Codes. Accepted for publication
in theGenetic and Evolutionary Computation Conference Workshop on the Computation
in Gene Expression.
-
Kargupta, H. (2000). Computation in Genetic Code-Like Transformations.
Proceedings of the Genetic and Evolutionary Computation Conference. AAAI
Press.
-
Kargupta, H. and Park, B. (1999). Fast
Construction of Distributed and Decomposed Evolutionary Representation.
Late Breaking Proceedings of the Genetic and Evolutionary
Computation Conference. 139--148. AAAI Press.
-
Kargupta, H. and Sarkar, K. (1999).
Function Induction, Gene Expression, and Evolutionary Representation Construction.
Proceedings of the Genetic and Evolutionary Computation Conference. vol
1. 313--320. AAAI Press.
-
Kargupta, H. & Stafford, B. (1997).
From DNA to Protein: Transformations and Their Possible Role in Linkage
Learning. Proceedings Of The International Conference On Genetic Algorithms,
Michigan. Morgan Kaufmann Publishers. 409--416.
-
Kargupta, H. (1996). Computational Processes
of Evolution: The SEARCH Perspective. Presented in SIAM Annual Meeting,
1996 as the winner of the 1996 SIAM Annual Best Student Paper Prize.
Book Chapters
-
Kargupta, H. (2000). Representation Constructions in Gene Expression. Submitted
to the Foundation of Genetic Algorithms. Morgan Kaufmann Pubs.
-
Kargupta, H. (2000). Gene Expression and Scalable Genetic Search. To be
published in Theory and application of evolutionary computation: recent
trends. Eds:Ashish Ghosh and Shigeyeoshi Tsutsui, Springer-Verlag, UK.
-
Kargupta, H. and Bandyopadhyay, S. (1998).
Further Experimentations on the Scalability of the GEMGA. Lecture Notes
in Computer Science. Vol. 1498. 315-324. Springer Verlag.
-
Kargupta, H. (1998). Gene Expression
and Large Scale Evolutionary Optimization. Computational Aerosciences in
the 21st Century. Kluwer Academic Publishers.
-
Kargupta, H. (1997). Gene Expression:
The Missing Link in Evolutionary Computation. Chapter 4. Genetic
Algorithms in Engineering and Computer Science. Eds: D. Quagliarella, J.
Periaux, C. Poloni and G. Winter. John Wiley & Sons Ltd. 59--83.
Workshops and Special Issues Organized
by the PI
1. Journal
of Genetic Programming and Evolvable Hardware Journal special
issue on Computation in Gene Expression.
2. Complex
Systems Journal Special Issue on the Computation
in Gene Expression
3. GECCO-2000
Workshop, Gene Expression: Missing Link in Evolutionary Computation
4. Special
Issue of Journal of Evolutionary Computation on Scalable Evolutionary Computation
(vol 7, Number 4, 1999)
Project Impact
1. Human Resources: Four
graduate students are alternately supported by this project with complementary
support from the department. Two undergraduate students were employed by
the PI and one of them is now a graduate student working on this project.
One of the graduate students is an woman.Three of them are working toward
Ph.D. and one toward M.S.
2. Development of the Genetic Algorithms course (cpts 549, School
of EECS, Washington State University).
3. Development of the School of EECS DIADIC laboratory comprised
of four Linux PCs, several Sparc machines, and one HP printer. This grant
money is also used for supporting the everyday maintenance of these machines
by the School of EECS systems maintenance group.
4. Development of the Genetic Algorithms course (CSEE, University
of Maryland Baltimore County).
5. Development of a research laboratory at CSEE, University
of Maryland Baltimore County.
Goals, Objectives,
and Targeted Activities
The goal of this research
project is to develop an efficient gene expression-based scalable evolutionary
algorithm for search, optimization, and machine learning. This work will
also apply the developed algorithms for distributed data mining. The approach
consists of,
1. abstracting the
transcription (DNA->mRNA), translation (mRNA->Protein sequence), and folding
(sequence to 3D folded structure of protein) operations in
the light of probabilistic and approximate representation construction;
2. development of search
operators using the developed understanding
3. experimentation
with large scale optimization and incorporation of these mechanisms into
an experimental distributed data mining system for discovering knowledge
from distributed data.
Upon successful completion, this
work will revolutionize the field of evolutionary adaptive algorithms.
It will offer a scalable approach for solving large optimization, machine
learning, and data mining problems. Moreover, it will offer a fundamental
understanding of the natural gene expression process from the perspective
of evolutionery computation.
Area Background
Little work has been done that explores
the role of gene expression from the perspective of scalable learning and
optimization. This work will lay the foundation of the research in this
area. For a review of the related literature please see (Kargupta, 1999)
H. Kargupta, R. Ayyagari, and S. Ghosh. (2003).
Learning Functions Using Randomized Expansions: Probabilistic
Properties and Experimentations. IEEE Transactions on Knowledge and Data Engineering,
Volume 16, Number 8, pages 894-908.
(pdf)
Area References
1. Goldberg, D. (1989).
Genetic Algorithms in Search, Optimization, and Machine Learning. Addison-Wesley.
New York
2. Holland, J. (1975).
Adaptation in Natural and Artificial Systems. University of Michigan Press.
Ann Arbor
3. Kargupta, H. (1997). Gene Expression:
The Missing Link In Evolutionary Computation. Chapter 4. Genetic
Algorithms in Engineering and Computer Science. Eds: D. Quagliarella, J.
Periaux, C. Poloni and G. Winter. John Wiley & Sons Ltd. 59--83.
Potential Related
Projects:
Extension of the analysis of the
binary genetic code to higher cardinality representations, techniques for
adaptively constructing genetic-code like representation transformations,
applications of the developed technqiues for large scale optimization and
machine learning.