<- previous    index     next ->

Lecture 13 Context Free Grammars, CFG

  Grammars that have the same languages as DFA's

  A grammar is defined as  G = (V, T, P, S) where
  V is a set of variables. We usually use capital letters for variables.
  T is a set of terminal symbols. This is the same as Sigma for a machine.
  P is a list of productions (rules) of the form:
      variable  ->  concatenation of variables and terminals
  S is the starting variable. S is in V. 

  A string z is accepted by a grammar G if some sequence of rules from P
  can be applied to z with a result that is exactly the variable S.

  We say that L(G) is the language generated (accepted) by the grammar G.

  To start, we restrict the productions P to be of the form
      A -> w         w is a concatenation of terminal symbols
      B -> wC        w is a concatenation of terminal symbols
                     A, B and C are variables in V
  and thus get a grammar that generates (accepts) a regular language.

  Suppose we are given a machine M = (Q, Sigma, delta, q0, F) with
  Q = { S }
  Sigma = { 0, 1 }
  q0 = S
  F = { S }
            delta    | 0 | 1 |
                  ---+---+---+
                   S | S | S |
                  ---+---+---+

  this looks strange because we would normally use q0 is place of S

  The regular expression for M is  (0+1)*

  We can write the corresponding grammar for this machine as
  G = (V, T, P, S) where
  V = { S }     the set of states in the machine
  T = { 0, 1 }  same as Sigma for the machine
  P =
       S -> epsilon | 0S | 1S

  S = S         the q0 state from the machine

  the construction of the rules for P is directly from M's delta
  If delta has an entry  from state S with input symbol 0 go to state S,
  the rule is   S -> 0S.
  If delta has an entry  from state S with input symbol 1 go to state S,
  the rule is   S -> 1S.

  There is a rule generated for every entry in delta.
  delta(qi,a) = qj  yields a rule  qi -> a qj

  An additional rule is generated for each final state, i.e. S -> epsilon
  (An optional encoding is to generate an extra rule for every transition
   to a final state: delta(qi,a) = any final state,  qi -> a
   with this option, if the start state is a final state, the production
   S -> epsilon is still required. )
  See g_reg.g file for worked example.

  The shorthand notation S -> epsilon | 0S | 1S is the same as writing
  the three rules.  Read "|" as "or".

  Grammars can be more powerful (read accept a larger class of languages)
  than finite state machines (DFA's NFA's NFA-epsilon regular expressions).

                                  i i
  For example the language L = { 0 1  | i=0, 1, 2, ... } is not a regular
  language. Yet, this language has a simple grammar
                 S -> epsilon | 0S1

  Note that this grammar violates the restriction needed to make the grammars
  language a regular language, i.e. rules can only have terminal symbols
  and then one variable. This rule has a terminal after the variable.

  A grammar for matching parenthesis might be
  G = (V, T, P, S)
  V = { S }
  T = { ( , ) }
  P = S -> epsilon | (S) | SS
  S = S

  We can check this be rewriting an input string 

   ( ( ( ) ( ) ( ( ) ) ) )
   ( ( ( ) ( ) (  S  ) ) )    S -> (S) where the inside S is epsilon
   ( ( ( ) ( )    S    ) )    S -> (S)
   ( ( ( )  S     S    ) )    S -> (S) where the inside S is epsilon
   ( ( ( )     S       ) )    S -> SS
   ( (  S      S       ) )    S -> (S) where the inside S is epsilon
   ( (     S           ) )    S -> SS
   (         S           )    S -> (S)
               S              S -> (S)

   Thus the string ((()()(()))) is accepted by G because the rewriting
   produced exactly S, the start variable.

   More examples of constructing grammars from language descriptions:

   Construct a CFG for non empty Palindromes over T = { 0, 1 }
   The strings in this language read the same forward and backward.
     G = ( V, T, P, S)  T = { 0, 1 }, V = S, S = S, P is below:
     S -> 0 | 1 | 00 | 11 | 0S0 | 1S1
       We started the construction with S -> 0  and  S -> 1 
       the shortest strings in the language.
       S -> 0S0  is a palindrome with a zero added to either end
       S -> 1S1  is a palindrome with a one added to either end
       But, we needed  S -> 00  and  S -> 11  to get the even length
       palindromes started.
       "Non empty" means there can be no rule  S -> epsilon.
 
                                                n  n
  Construct the grammar for the language L = { a  b  n>0 }
    G = ( V, T, P, S )  T = { a, b }  V = { S }  S = S  P is:
    S -> ab | aSb
    Because n>0 there can be no S -> epsilon
    The shortest string in the language is  ab
    a's have to be on the front, b's have to be on the back.
    When either an "a" or a "b" is added the other must be added
    in order to keep the count the same. Thus  S -> aSb.
    The toughest decision is when to stop adding rules.
    In this case start "generating" strings in the language
        S -> ab             ab      for n=1
        S -> aSb           aabb     for n=2
        S -> aSb          aaabbb    for n=3  etc.
    Thus, no more rules needed.

    "Generating" the strings in a language defined by a grammar
    is also called "derivation" of the strings in a language.
    
    <- previous    index     next ->

Other links

Go to top