April 7, 2013

KBP chapter 4

Rickvian Aldi / 02PCT / 1601253441
Assignment By Mr. Tri Djoko Wahjono

Review questions


2.Explain the three reasons why lexical analysis is separated from syntax analysis.

1. Simplicity—Techniques for lexical analysis are less complex than those
required for syntax analysis, so the lexical-analysis process can be simpler
if it is separate. Also, removing the low-level details of lexical analysis
from the syntax analyzer makes the syntax analyzer both smaller and
less complex.
2. Efficiency—Although it pays to optimize the lexical analyzer, because
lexical analysis requires a significant portion of total compilation time,
it is not fruitful to optimize the syntax analyzer. Separation facilitates
this selective optimization.
3. Portability—Because the lexical analyzer reads input program files
and often includes buffering of that input, it is somewhat platform
dependent. However, the syntax analyzer can be platform independent.
It is always good to isolate machine-dependent parts of any software
system.


3.define lexeme and token.

-Formal descriptions of the syntax of programming languages, for simplicity’s
sake, often do not include descriptions of the lowest-level syntactic
units. These small units are called lexemes.

-token of a language is a category of its lexemes. For example, an identifier is a token
that can have lexemes, or instances, such as sum and total. In some cases, a
token has only a single possible lexeme. Each lexeme group is represented by a name, or token.

Consider the following
Java statement:

index = 2 * count + 17;
The lexemes and tokens of this statement are
Lexemes Tokens
index identifier
= equal_sign
2 int_literal
* mult_op
count identifier
+ plus_op
17 int_literal
; semicolon


4. what are the primary task of a lexical analyzer?

-A lexical analyzer is essentially a pattern matcher. A pattern matcher attempts to
find a substring of a given string of characters that matches a given character pattern.
Pattern matching is a traditional part of computing.


6.What is a state transition diagram?

-A state transition diagram, or just state diagram, is a directed graph. The
nodes of a state diagram are labeled with state names. The arcs are labeled with
the input characters that cause the transitions among the states. An arc may also
include actions the lexical analyzer must perform when the transition is taken.


7.Why are character classes used, rather than individual characters, for the letter and digit transitions of a state diagram for a lexical analyzer?

-Suppose we need a lexical analyzer that recognizes only arithmetic expressions,
including variable names and integer literals as operands. Assume that
the variable names consist of strings of uppercase letters, lowercase letters, and digits but must begin with a letter. Names have no length limitation. The first
thing to observe is that there are 52 different characters (any uppercase or lowercase letter) that can begin a name, which would require 52 transitions from the transition diagram’s initial state. However, a lexical analyzer is interested only in determining that it is a name and is not concerned with which specific
name it happens to be. Therefore, we define a character class named LETTER
for all 52 letters and use a single transition on the first letter of any name.


13.Why are named constants used, rather than numbers, for token codes?

-for the sake of readability of lexical and syntax analyzers.


14.Describe how a recursive-descent parsing subprogram is written for a rule with a single RHS.

-A recursive-descent subprogram for a rule with a single RHS is relatively
simple. For each terminal symbol in the RHS, that terminal symbol is compared
with nextToken. If they do not match, it is a syntax error. If they match,
the lexical analyzer is called to get the next input token. For each nonterminal, the parsing subprogram for that nonterminal is called.


15.What is the FIRST set for a given grammar and sentential form?

FIRST( ) = {a => * a } (If => * , is in FIRST( ))
in which =>* means 0 or more derivation steps.



18.What is left factoring?

-Left factoring is the action taken when a grammar leads backtracking while making parsing/syntax tree

Let us have a grammar:-
A->bcW | bcE
W->x
E->q

Now lets have a look on the production A->bcW | bcE, here 'bc' is repeating.
If we want to parse a string "bcq", then firstly we have two choices - either choose A->bcW or A->bcE. Since both have the same probability, we choose A->bcW. Then when we are finished with "bc" and turn to parse "_ _ q". After this, we have to backtrack because this production is in "A->bcE".

This shows the problem with such type of problem. Now to overcome this problem we have a concept of Left factoring.
In this we take common symbols in a new non terminal. Lets have a look on how we can do it.
Above grammar after applying left factoring is

A->bcX
X->W | E
W->x
E->q


http://en.wikipedia.org/wiki/Left_factoring


Problem Set

(1)1. Perform the pairwise disjointness test for the following grammar rules.
a. A → aB | b | cBB
b. B → aB | bA | aBb
c. C→ aaA | b | caB

answer :
A -> aB | b | CBB

first(aB) = a

first(b) = b

first(CBB) = aaA = a


B -> aB | ba | aBb

first(aB) = a

first(ba) = b

first(aBb) = a

They are intersected & thus the rule fails the test.


C -> aaA | b | caB

first(aaA) = a

first(b) = b

first(caB) = c

are not intersected & thus the rule passes


(2)2. Perform the pairwise disjointness test for the following grammar rules.
a. S→ aSb | bAA
b. A → b{aB} | a
c. B → aB | a

answer : 
 S→ aSb | bAA

first(aSb) = a

first(bAA) = b

are not intersected, the rule passes

A → b{aB} | a

first(b{aB}) = b

first(a) = a

are not intersected, the rule passes

B → aB | a

first(aB) = a

first(a) = a

They are intersected the rule fails the test.



(3)3. Show a trace of the recursive descent parser given in Section 4.4.1 for
the string a + b * c.

Consider the following EBNF description of simple arithmetic expressions:
<expr> → <term> {(+ | -) <term>}
<term> → <factor> {(* | /) <factor>}
<factor> → id | int_constant | ( <expr> )

answer :

Next token is: 10 Next lexeme is a
Enter <expr>
Enter <term>
Enter <factor>

Next token is: 21 Next lexeme is +
Exit <factor>
Exit <term>

Next token is: 10 Next lexeme is b
Enter <term>
Enter <factor>

Next token is: 10 Next lexeme is *
Exit <factor>

Next token is: 26 Next lexeme is c
Enter<factor>

Next token is: -1 Next lexeme is EOF
Exit <term>
Exit <expr>


4. Show a trace of the recursive descent parser given in Section 4.4.1 for
the string a * (b + c).

answer: 

Next token is: 10 Next lexeme is a
Enter <expr>
Enter <term>
Enter <factor>

Next token is: 21 Next lexeme is *
Exit <factor>

Next token is: 25 Next lexeme is (
Enter <factor>

Next token is: 10 Next lexeme is b
Enter <expr>
Enter <term>
Enter <factor>

Next token is: 21 Next lexeme is +
Exit <factor>

Next token is: 26 Next lexeme is c
Enter <factor>

Next token is: 26 Next lexeme is )
Exit <term>
Exit <expr>

Next token is: -1 Next lexeme is EOF
Exit <factor>
Exit <term>
Exit <expr>