System Software

Complier

Complier:  compiler is a translator program written in (High-level language) the source program and translate into an equivalent program in (machine-level language) the target program. An important part of compiler is an error showing to the programmer.

Executing a program written in high-level language is basically of two parts the source program must first be compiled translated into an object program. Then the result object program is loaded into memory executed.

Complier Design

The phase of Compiler :

A compiler operates n phase. A phase s a logically interrelated operation that takes the source program in one representation and produces output in the other representation. The phases of a compiler are shown below:

There are two phases of compilation

  • Analysis (Machine independent feature)
  • Synthesis (machine-dependent feature)

The compilation process s partitioned into number-of-processes called phases

Lexical Analysis:=

LA or Scanner reads the source program one character at a time, carrying the source program into a sequence of automatic units called tokens.

Syntax Analysis:=

The second stage of translation is called syntax analysis or parsing. In this phase expressions statements, declaration,s etc., are identified by using the results of lexical analysis. Syntax analysis is aided by using techniques based on the formal grammar of the programming language.

Intermediate code generation:=

An intermediate representation of the final machine language code s produced this phase bridges the analysis and synthesis phases of translation.

Code Optimization:

This is an optional phase described to improve the intermediate code so that the output runs faster and takes less space.

Code Generation

The last phase of translation is code generation. A number of optimizations to reduce the length of machine language programs are carried out during this phase. The output of the code generation is the machine language program of the specified computer.

Table Management;

This is the portion to keep the names used by the program and reduces essential information about each. The data structure used to reduce this information is called a symbol table.

Error Handlers;

It  invoked when a flow error n that source program is detected the output of LA is a stream of tokens, which is passed to the next phase, the syntax analyzer or parse, the SA groups the tokens together into a syntactic structure called as expression, expression may further be combined o form statements the synaptic structure can be regarded as a tree whose leaves are the tokens called as a parse tree

Code Generator:

Code generator produces the object code by deciding o the memory location for data. Selecting code to access each datum and selecting the register in which each computation can be done. Many computers have only a few high-speed registers in which computational cab be performed quickly. A good code generator would attempt to utilize the register as efficiently as possible.

Example :

Basic compiler functions

in this, the fundamental operations that are necessary for compiling a typical high level-languages program.

For the purpose of compiler construction usually, high-level programming languages is usually described I terms of grammar for example, an assignment statement might be defined by the grammar as a variable name, followed by assignment operator:=) followed by an expression

Sum := 0

Sum := sum +value;

The problem of compilation then becomes one of matching statements written by the programmer to structure defined by their grammar and generating the appropriate object code for each statement

Grammars

Grammars

  • A grammar for programming language s formal description of the syntax or form, of programs and individual.
  • A number of different notations can be used for writing grammar. for our example BNF (Backus Norm forms) grammar bcz of its simplicity.
  • A BNF grammar is highly restricted subset of the Pascal lang.
  • A BNF grammar consist of set of rues, each of which defines the syntax of some construct in the programming Language.

1  PROGRAM STA B

2  VAR

3  SUM, SUMSQ,I,VALUE : INTEGER

4  BEGIN

5                 SUM : = 0;

6                 SUMSQ : = 0;

7                 FOR I : =0 TO 100 DO

8                 BEGIN

9                           READ ( VALUE );

10                         SUM : = SUM + VALUE;

11                         SUMSQ : = SUMSQ + VALUE * VALUE;

12               END;

13               MEAN : = SUM DIV 100;

14               VARIANCE : = SUMSQ DIV 100 – MEAN * MEAN;

15               WRITE (MEAN, VARIANCE)

16 END

Fig : example of a pascal program

The corresponding BNF grammar for the above pascal program can be represented as follows.

1 <PROG> :: Program <progname> VAR <dec-list> BEGN <stmt-list> END
2 <program-name> :: = pd
3 <dec-list> :: = <dec> | <dec-list>; <dec>
4 <dec> :: = INTEGER
5 <type> :: = id | <id-list>, id
6 <id_ist> :: = id | <id-list>, id
7 <stmt_list> :: = <stmt> | <stmt-list>; <stmt>
8 <stmt> ::= <assign> | <read> | <write> | <for>
9 <assign> ::= id : = <exp>
10 <exp> ::= <term> | <exp> + <term> | <exp> – <term>
11 <term> ::= <factor> | <term> * <factor> | <term> DIV <factor>
12 <factor> ::= id | int | (<exp)
13 <read> :: = READ (<id-list>)
14 <write> :: = WRITE (<id-list>)
15 <for> ::= FOR <idex-exp> to <body>
16 <index-exp> ::= id := <exp> to <exp>
17 <body> ::= <stmt> | BEGIN <stmt-list>
END

Fig : Simplified Pascal grammar

Consider for example

<read> ::= READ (<id_list>)

This is a definition of the syntax of a Pascal READ statement that for denoted in the grammar as <read>. The symbol::= can be read “ is defined to be “. On the left of this symbol is the language constructed begin defined, <read>, and on the right is a description of the syntax begin defined for it.

  • A character strings enclosed between the angle brackets <and > is called non-terminal symbols.
  • Entries do not enclose between angle brackets are a terminal symbols of the grammar.
  • In the above example, the non-terminal symbols are <read> and <id_list>, and the terminal symbols are the token READ, (, and).
  • Thus this rule specifies that a <read> consists of the token READ, followed by the token C, followed by a language, construct <id_list>. Followed by the token).
  • The blank spaces in the grammar rule are not significant. They are included only to improve readability.

Lexical Analysis

  • The lexical analysis involves scanning the program to be complied with and recognizing the tokens that make up the source statements.
  • Scanners are usually designed to recognize keywords, operators, and identifiers as well as integers, floating pt numbers, characters string,s and other similar items that are written as part of the program.
  • Items such as identifiers and integers are usually recognized directly as single tokens.

For example, an identifier might be defined by the rules.

  • In this case, the scanner would recognize as tokens the single characters A, B, C, D, 1, and so on.
  • The output of the scanner consists of the sequence of tokens.
  • When the token begins scanned as a keyword or an operator, the coding schema gives sufficient information this can be accomplished by associating for tokens, which specifies gives the information about identifier name, integer value, etc., that was found by the scanner.

Modeling Scanner as finite automata

  • The tokens can be recognized as a finite automation
  • Mathematically, finite automation consists of a finite set o states and a set of states and a set of transitions from one state to another.
  • One of the states is designed as the starting state, and one or more states are designated as final states.
  • Finite automata are represented graphically as shown below

  • States are represented by circles, and transitions by an arrow from one state to another. Each arrow is labeled with a character and a set of characters that causes a specified transition to occur.
  • The starting state has an arrow entering it has is not connected to anything else. Final states are identified by double circles.

  • Finite automaton provides an easy way to visualize the operation of a scanner.
  • The algorithm to recognize the token for the problem of recognizing identifiers that may contain underscores is as follows.

get first Input_Character

if Input_Character in [‘A’, ….’Z’] then
begin
while Input_Character in [‘A’,….’Z’] [‘0′,…..’9’] do
begin
get next Input_character
if Input_Character = ‘_’ then
begin
get next Input_character
last_char_is_underscore := true
end { if ‘_’ }
else
last_char_is_underscore := false
end { while }
if last_char_is_underscore then
return ( token error)
else
return ( valid token)
end if { if first in [‘A- Z’] }
else
return (token_error)
fig : token recognizition using algorithmic

State A – Z 0 – 9 Starting State
1 2 Starting State
2 2 2 3 Final State
3 2 2 Final State

 

Fig: using a tabular representation of the finite automaton

 

Syntactic Analysis

  • During syntactic analysis, the source statements written by the programmer are recognized as language constant described by grammar begin used.
  • This process is begin through of as building the parse three for the statements being translate.

Parsing techniques are divided into two general classes

  • Bottom_up Parsing
  • Top-down Parsing
  • Top_down methods begin with the rule of the grammar that specifies the goal of the analysis and attempt to construct the tree so that the terminal nodes match the statement begin analyzed.
  • Bottom-up methods begin with the terminal nodes of the tree, and attempt to combine these into successively higher –level; nodes until the root is reached.

Recursive Descent Parsing

A top-down method which is known as recursive descent, s mode up of procedure for each non-terminal symbol is grammar.

When a procedure is called, t attempts to find a substituting of the input, beginning with the current token, that can be interpreted as the non-terminal with which the procedure is associated.

In doing this process, it may call other procedure of even call itself recursively, to search for other non-terminal

  • If a procedure finds the non-terminal that its goal, it returns the indication of success to its caller.

If the procedure s unable to find a substituting that can be interpreted as the describe non-terminal, I return the indication of failure or invokes an error diagnosis and recovery routine.

  • < read> :: = RWEAD (< id_list> )

Consider an example, in this procedure for <read> in a recursive descent parser first examines the next two input tokens, working for READ and ( if these are found, the procedure for <read> then call the procedure for <id_list. If that procedure succeeds, the <read> procedure examines the next input token; looking for) . if all these tests are successful, then <read> statement returns an indication of success to its caller and advances to the next token following). Other wise, the procedure returns indication of failure.

  • If we attempt to write a complete set of procedure for grammar, of we would discover a problem. The procedure <id_list> corresponding to <id_list>::= id| ( id_list>, id}

would be unable to decide between this two alternatives since both id and <idlist> can begin with id.

If the procedure decides second alternative(<id_list>, id), it would immediately call itself recursively to find n <id_lst>

This could result in another immediate recursive call when leads to an unending chain. The reason for this is that one of the alternatives for <id_lst> begins with <id_list>. Top_down parser cannot be directly used with a grammar that contains this kind of immediate left recursion.

  • <id_list > :: = id {, id}

This notation, which is a common extension of BNF, specifies that the terms between { and } may be omitted or repeated one or more times. The above statement defines <id_lst> as begin composed of an id followed by zero or more occurrence of “, id”. This eliminates the problem of left recursion and is also an alternative for <id_list> to try.

The below failure illustrate a recursive descent parser of the READ statement.

      Procedure READ
Begin
FOUND : = FALSE
IF token = 8 { READ }
begin
advance to next token
if TOKEN = 20 { ( } then
begin
advance to next token
if IDLIST returns success then
if TOKEN1 = 21 { ) } then
begin
FOUND := TRUE
advance to next token
end { if )}
end { if ( }
end { if READ }

    if found = TRUE then
return success
else
return failure
end { READ }

Procedure IDLIST
Begin
FOUND : = FALSE
IF token = 22 { id } then
begin
FOUND : = true
advance to next token
while (TOKEN = 14 { , }) and
(FOUND = TRUE) do
begin
advance to next token
if TOKEN = 22 { id } then
advance to next token
begin
FOUND := TRUE

end { while }
send { if id }

if found = TRUE then
return success
else
return failure
end { ID_LIST}