Polynomial Complexity Computation, Representation Construction, and the Genetic Code

National Science Foundation Regular Grant, IIS-0083946


Abstract:

The gene expression process in nature evaluates the fitness of a DNA through the production of different proteins in different cells. The DNA sequence first produces the mRNA sequence and next the mRNA produces the protein sequence by using a transformation called the genetic code. Finally the protein sequence folds into a three-dimensional structure which determines the fitness of the genome. This proposal points out that genetic code-like transformations introduce very interesting properties to the representation of a genetic fitness function. A recent result by the PI showed that such transformations in binary sequence representation can convert functions with exponentially large description to one that is highly  suitable for polynomial-size approximation under certain practical conditions. Precisely speaking, there exists a class of genetic code-like transformations that can construct  a Fourier representation
with only a polynomial number of terms that are exponentially more significant than the rest when fitter chromosomes are given more copies through a redundant, equivalent representation. This is a very desirable property for efficient induction of  function representation from data which is a fundamental problem in learning, data mining, and optimization.
 

The proposed research will extend this preliminary finding and explore the computation in richer genetic code-like transformations for non-binary representations that are used in natural DNA, mRNA, and proteins.  At this point, there exists no known strong techniques to induce functions  with exponentially large representation to functions with only a polynomial number of terms. If the genetic code offers such properties then it will have a significant impact on the theory of computation, machine learning, data mining, pattern recognition, automated program induction, and optimization. It will also help explaining the scalable adaptive behavior of the natural genetic search.
 
 

The proposed research will have the following primary components:
 


Recent Publications
  1. H. Kargupta, R. Ayyagari, and S. Ghosh. (2003). Learning Functions Using Randomized Expansions: Probabilistic Properties and Experimentations. IEEE Transactions on Knowledge and Data Engineering, Volume 16, Number 8, pages 894-908. (pdf) (A paper on probablistic properties of randomized genetic code-like transformations)