20.01.2015 Views

Slides02 - Computer Science and Engineering

Slides02 - Computer Science and Engineering

Slides02 - Computer Science and Engineering

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

CS308 Compiler Principles<br />

Lexical Analyzer<br />

Fan Wu<br />

Department of <strong>Computer</strong> <strong>Science</strong> <strong>and</strong> <strong>Engineering</strong><br />

Shanghai Jiao Tong University


Lexical Analyzer<br />

• Lexical Analyzer reads the source program<br />

character by character to produce tokens.<br />

– strips out comments <strong>and</strong> whitespaces<br />

– returns a token when the parser asks for<br />

– correlates error messages with the source<br />

program<br />

2<br />

Compiler Principles


Token<br />

• A token is a pair of a token name <strong>and</strong> an optional<br />

attribute value.<br />

– Token name specifies the pattern of the token<br />

– Attribute stores the lexeme of the token<br />

• Tokens<br />

– Keyword: “begin”, “if”, “else”, …<br />

– Identifier: string of letters or digits, starting with a letter<br />

– Integer: a non-empty string of digits<br />

– Punctuation symbol: “,”, “;”, “(”, “)”, …<br />

• Regular expressions are widely used to specify<br />

patterns of the tokens.<br />

3<br />

Compiler Principles


Token Example<br />

4<br />

Compiler Principles


Terminology of Languages<br />

• Alphabet: a finite set of symbols<br />

– ASCII<br />

– Unicode<br />

• String: a finite sequence of symbols on an alphabet<br />

– is the empty string<br />

– |s| is the length of string s<br />

– Concatenation: xy represents x followed by y<br />

– Exponentiation: s n = s s s .. s ( n times) s 0 = <br />

• Language: a set of strings over some fixed alphabet<br />

– the empty set is a language<br />

– The set of well-formed C programs is a language<br />

7<br />

Compiler Principles


Operations on Languages<br />

• Union: L 1 L 2 = { s | s L 1 or s L 2 }<br />

• Concatenation: L 1 L 2 = { s 1 s 2 | s 1 L 1<br />

L 2 }<br />

<strong>and</strong> s 2 <br />

• (Kleene) Closure:<br />

• Positive Closure:<br />

L<br />

L<br />

*<br />

<br />

<br />

<br />

<br />

i 0<br />

<br />

<br />

i 1<br />

i<br />

L<br />

i<br />

L<br />

8<br />

Compiler Principles


Example<br />

• L 1 = {a,b,c,d} L 2 = {1,2}<br />

• L 1 L 2 = {a,b,c,d,1,2}<br />

• L 1 L 2 = {a1,a2,b1,b2,c1,c2,d1,d2}<br />

• L 1<br />

*<br />

= all strings using letters a,b,c,d including<br />

the empty string<br />

• L 1+ = all strings using letters a,b,c,d without<br />

the empty string<br />

9<br />

Compiler Principles


Regular Expressions<br />

• Regular expression is a representation of a<br />

language that can be built from the operators<br />

applied to the symbols of some alphabet.<br />

• A regular expression is built up of smaller<br />

regular expressions (using defining rules).<br />

• Each regular expression r denotes a<br />

language L(r).<br />

• A language denoted by a regular expression<br />

is called as a regular set.<br />

10<br />

Compiler Principles


Regular Expressions (Rules)<br />

Regular expressions over alphabet <br />

Reg. Expr<br />

<br />

a <br />

(r 1 ) | (r 2 ) L(r 1 ) L(r 2 )<br />

(r 1 ) (r 2 ) L(r 1 ) L(r 2 )<br />

(r) * (L(r)) *<br />

(r)<br />

L(r)<br />

Language it denotes<br />

L() = {}<br />

L(a) = {a}<br />

Extension<br />

(r) + = (r)(r) * (L(r)) +<br />

(r) = (r) | <br />

L(r) {} zero or one instance<br />

[a 1 -a n ] L(a 1 |a 2 |…|a n ) character class<br />

11<br />

Compiler Principles


Regular Expressions (cont.)<br />

• We may remove parentheses by using<br />

precedence rules:<br />

– * highest<br />

– concatenation second highest<br />

– | lowest<br />

• (a(b) * )|(c) ab * |c<br />

• Example:<br />

– = {0,1}<br />

– 0|1 => {0,1}<br />

– (0|1)(0|1) => {00,01,10,11}<br />

–0 * => { ,0,00,000,0000,....}<br />

– (0|1) * => all strings with 0 <strong>and</strong> 1, including the empty<br />

string<br />

12<br />

Compiler Principles


Regular Definitions<br />

• We can give names to regular expressions, <strong>and</strong><br />

use these names as symbols to define other<br />

regular expressions.<br />

13<br />

• A regular definition is a sequence of the<br />

definitions of the form:<br />

d 1 r 1 where d i is a innovative symbol <strong>and</strong><br />

d 2 r 2 r i is a regular expression over symbols<br />

… in {d 1 ,d 2 ,...,d i-1 }<br />

d n r n<br />

alphabet<br />

previously defined<br />

symbols<br />

Compiler Principles


Regular Definitions Example<br />

• Example: Identifiers in Pascal<br />

letter A | B | ... | Z | a | b | ... | z<br />

digit 0 | 1 | ... | 9<br />

id letter (letter | digit ) *<br />

– If we try to write the regular expression<br />

representing identifiers without using regular<br />

definitions, that regular expression will be<br />

complex.<br />

(A|...|Z|a|...|z) ( (A|...|Z|a|...|z) | (0|...|9) ) *<br />

14<br />

Compiler Principles


Grammar<br />

Regular Definitions<br />

15<br />

Compiler Principles


Transition Diagram<br />

• State: represents a condition that could<br />

occur during scanning<br />

– start/initial state:<br />

– accepting/final state: lexeme found<br />

– intermediate state:<br />

• Edge: directs from one state to another,<br />

labeled with one or a set of symbols<br />

16<br />

Compiler Principles


Transition Diagram for relop<br />

Transition Diagram for ``relop < | > |< = | >= | = | ’’<br />

17<br />

Compiler Principles


Transition-Diagram-Based Lexical Analyzer<br />

18<br />

Implementation of relop transition diagram<br />

Compiler Principles


Transition Diagram for Others<br />

A transition diagram for id's<br />

19<br />

A transition diagram for unsigned numbers<br />

Compiler Principles


Practice<br />

• Draw the transition diagram for recognizing<br />

the following regular expression<br />

a(a|b)*a<br />

a|b<br />

b<br />

a<br />

a a<br />

1 2 3<br />

a a<br />

1 2 3<br />

Nondeterministic<br />

b<br />

Deterministic<br />

20<br />

Compiler Principles


Finite Automata<br />

• A finite automaton is a recognizer that takes a<br />

string, <strong>and</strong> answers “yes” if the string matches a<br />

pattern of a specified language, <strong>and</strong> “no”<br />

otherwise.<br />

• Two kinds:<br />

– Nondeterministic finite automaton (NFA)<br />

• no restriction on the labels of their edges<br />

– Deterministic finite automaton (DFA)<br />

• exactly one edge with a distinguished symbol goes out of<br />

each state<br />

• Both NFA <strong>and</strong> DFA have the same capability<br />

• We may use NFA or DFA as lexical analyzer<br />

21<br />

Compiler Principles


Nondeterministic Finite Automaton (NFA)<br />

• A NFA consists of:<br />

– S: a set of states<br />

– Σ: a set of input symbols (alphabet)<br />

– A transition function: maps state-symbol pairs to sets of<br />

states<br />

– s 0 : a start (initial) state<br />

– F: a set of accepting states (final states)<br />

• NFA can be represented by a transition graph<br />

• Accepts a string x, if <strong>and</strong> only if there is a path from<br />

the starting state to one of accepting states such that<br />

edge labels along this path spell out x.<br />

• Remarks<br />

– The same symbol can label edges from one state to<br />

several different states<br />

– An edge may be labeled by ε, the empty string<br />

22<br />

Compiler Principles


NFA Example (1)<br />

The language recognized by this NFA is (a|b) * a b<br />

23<br />

Compiler Principles


NFA Example (2)<br />

NFA accepting aa* |bb*<br />

24<br />

Compiler Principles


Implementing an NFA<br />

S -closure({s 0 })<br />

c nextchar()<br />

while (c != eof) {<br />

begin<br />

S -closure(move(S,c))<br />

{ set all of states can be accessible<br />

from s 0 by -transitions }<br />

{ set of all states can be<br />

accessible from a state in S by a<br />

transition on c}<br />

c nextchar<br />

end<br />

if (SF != ) then { if S contains an accepting state }<br />

return “yes”<br />

else<br />

return “no”<br />

Subset Construction<br />

25<br />

Compiler Principles


Deterministic Finite Automaton (DFA)<br />

• A Deterministic Finite Automaton (DFA) is<br />

a special form of a NFA.<br />

– No state has ε- transition<br />

– For each symbol a <strong>and</strong> state s, there is at<br />

most one a labeled edge leaving s.<br />

start<br />

The language recognized by this DFA is also (a|b) * a b<br />

26<br />

Compiler Principles


Implementing a DFA<br />

s s 0 { start from the initial state }<br />

c nextchar { get the next character from the<br />

input string }<br />

while (c != eof) do { do until the end of the string }<br />

begin<br />

s move(s,c) { transition function }<br />

c nextchar<br />

end<br />

if (s in F) then { if s is an accepting state }<br />

return “yes”<br />

else<br />

return “no”<br />

28<br />

Compiler Principles


NFA vs. DFA<br />

Compactibility Readability Speed<br />

NFA Good Good Slow<br />

DFA Bad Bad Fast<br />

• DFAs are widely used to build lexical analyzers.<br />

30<br />

NFA<br />

DFA<br />

The language recognized (a|b) * a b<br />

Compiler Principles


Pop Quiz<br />

1) What are the languages presented by the two FAs<br />

(a)<br />

0 1 1 0<br />

1 2 3 4 5<br />

1 0 0 1<br />

0<br />

6<br />

0 0<br />

7 8 9<br />

1 1 1<br />

Solution: 01 strings with length 4, except 0110<br />

a a a a<br />

(b) 1 2 3 4 5<br />

31<br />

Solution: a(aaaaa)*<br />

a<br />

Compiler Principles<br />

31


Pop Quiz<br />

2) For a language only accepting characters from {0,1},<br />

please design a DFA which represents all strings containing<br />

three ‘0’s.<br />

Solution:<br />

1<br />

1 1 1<br />

1<br />

0 0 0<br />

2 3 4<br />

32<br />

Compiler Principles


Regular Expression NFA<br />

• McNaughton-Yamada-Thompson (MYT)<br />

construction<br />

– Simple <strong>and</strong> systematic<br />

– Construction starts from the simplest parts<br />

(alphabet symbols).<br />

– For a complex regular expression, subexpressions<br />

are combined to create its NFA.<br />

– Guarantees the resulting NFA will have<br />

exactly one final state, <strong>and</strong> one start state.<br />

33<br />

Compiler Principles


MYT Construction<br />

• Basic rules: for subexpressions with no<br />

operators<br />

– For expression <br />

start<br />

i<br />

<br />

f<br />

– For a symbol a in the alphabet <br />

start<br />

i<br />

a<br />

f<br />

34<br />

Compiler Principles


MYT Construction Cont’d<br />

• Inductive rules: for constructing larger<br />

NFAs from the NFAs of subexpressions<br />

(Let N(r 1 ) <strong>and</strong> N(r 2 ) denote NFAs for regular<br />

expressions r 1 <strong>and</strong> r 2 , respectively)<br />

– For regular expression r 1 | r 2<br />

start<br />

i<br />

<br />

N(r 1 )<br />

<br />

f<br />

<br />

N(r 2 )<br />

<br />

35<br />

Compiler Principles


MYT Construction Cont’d<br />

– For regular expression r 1 r 2<br />

start<br />

i N(r 1 ) N(r 2 ) f<br />

– For regular expression r *<br />

<br />

start<br />

i<br />

<br />

N(r)<br />

<br />

f<br />

<br />

36<br />

Compiler Principles


Example: (a|b) * a<br />

a:<br />

b:<br />

a<br />

b<br />

(a|b):<br />

<br />

<br />

a<br />

b<br />

<br />

<br />

<br />

(a|b) * :<br />

<br />

<br />

<br />

a<br />

b<br />

<br />

<br />

<br />

<br />

(a|b) * a:<br />

<br />

<br />

<br />

<br />

a<br />

b<br />

<br />

<br />

<br />

a<br />

<br />

37<br />

Compiler Principles<br />

37


Properties of the Constructed NFA<br />

1. N(r) has at most twice as many states as there<br />

are operators <strong>and</strong> oper<strong>and</strong>s in r.<br />

– This bound follows from the fact that each step of<br />

the algorithm creates at most two new states.<br />

2. N(r) has one start state <strong>and</strong> one accepting<br />

state. The accepting state has no outgoing<br />

transitions, <strong>and</strong> the start state has no incoming<br />

transitions.<br />

3. Each state of N(r) other than the accepting<br />

state has either one outgoing transition on a<br />

symbol in {} or two outgoing transitions,<br />

both on .<br />

38<br />

Compiler Principles


Conversion of an NFA to a DFA<br />

• Approach: Subset Construction<br />

– each state of the constructed DFA corresponds to<br />

a set / combination of NFA states<br />

• Details<br />

1 Create transition table Dtran for the DFA<br />

2 Insert -closure(s 0 ) to Dstates as initial state<br />

3 Pick a not visited state T in Dstates<br />

4 For each symbol a, Create state<br />

-closure(move(T, a)), <strong>and</strong> add it to Dstates <strong>and</strong><br />

Dtran<br />

5 Repeat step (3) <strong>and</strong> (4) until all states in<br />

Dstates are visited<br />

39<br />

Compiler Principles


The Subset Construction<br />

40<br />

Compiler Principles


NFA to DFA Example<br />

NFA for (a|b) * abb<br />

Transition table for DFA<br />

Equivalent DFA<br />

4<br />

41<br />

Compiler Principles


Regular Expression DFA<br />

• First, augment the given regular expression<br />

by concatenating a special symbol #<br />

r r# augmented regular expression<br />

• Second, create a syntax tree for the<br />

augmented regular expression.<br />

– All leaves are alphabet symbols (plus # <strong>and</strong> the<br />

empty string)<br />

– All inner nodes are operators<br />

• Third, number each alphabet symbol (plus #)<br />

(position numbers)<br />

44<br />

Compiler Principles


Regular Expression DFA Cont’d<br />

(a|b) * a (a|b) * a#<br />

augmented regular expression<br />

a<br />

1<br />

*<br />

|<br />

<br />

b<br />

2<br />

<br />

a<br />

3<br />

#<br />

4<br />

<br />

<br />

<br />

1<br />

2<br />

<br />

a<br />

b<br />

<br />

<br />

<br />

<br />

a<br />

3 4 # F<br />

Syntax tree of (a|b) * a#<br />

• each symbol is at a leaf<br />

• each symbol is numbered (positions)<br />

• inner nodes are operators<br />

45<br />

Compiler Principles


followpos<br />

Then we define the function followpos for the positions (positions<br />

assigned to leaves).<br />

followpos(i) -- the set of positions which can follow<br />

the position i in the strings generated by<br />

the augmented regular expression.<br />

Example: ( a | b) * a #<br />

1 2 3 4<br />

followpos(1) = {1,2,3}<br />

followpos(2) = {1,2,3}<br />

followpos(3) = {4}<br />

followpos(4) = {}<br />

followpos() is just defined for leaves,<br />

not defined for inner nodes.<br />

46<br />

Compiler Principles


firstpos, lastpos, nullable<br />

• To compute followpos, we need three more<br />

functions defined for the nodes (not just for<br />

leaves) of the syntax tree.<br />

– firstpos(n) -- the set of the positions of the first<br />

symbols of strings generated by the subexpression<br />

rooted by n.<br />

– lastpos(n) -- the set of the positions of the last<br />

symbols of strings generated by the subexpression<br />

rooted by n.<br />

– nullable(n) -- true if the empty string is a<br />

member of strings generated by the subexpression<br />

rooted by n; false otherwise<br />

47<br />

Compiler Principles


Usage of the Functions<br />

(a|b) * a (a|b) * a#<br />

augmented regular expression<br />

m<br />

*<br />

|<br />

n<br />

<br />

<br />

a<br />

3<br />

#<br />

4<br />

nullable(n) = false<br />

nullable(m) = true<br />

firstpos(n) = {1, 2, 3}<br />

a<br />

1<br />

b<br />

2<br />

lastpos(n) = {3}<br />

Syntax tree of (a|b) * a#<br />

48<br />

Compiler Principles


Computing nullable, firstpos, lastpos<br />

n nullable(n) firstpos(n) lastpos(n)<br />

leaf labeled true <br />

leaf labeled<br />

with position i<br />

false {i} {i}<br />

|<br />

c 1 c 2<br />

nullable(c 1 ) or<br />

nullable(c 2 )<br />

firstpos(c 1 ) firstpos(c 2 )<br />

lastpos(c 1 ) <br />

lastpos(c 2 )<br />

nullable(c 1 )<br />

c 1 c 2<br />

<strong>and</strong><br />

nullable(c 2 )<br />

if (nullable(c 1 ))<br />

firstpos(c 1 )firstpos(c 2 )<br />

else firstpos(c 1 )<br />

if (nullable(c 2 ))<br />

lastpos(c 1 )lastpos(c 2 )<br />

else lastpos(c 2 )<br />

*<br />

true firstpos(c 1 ) lastpos(c 1 )<br />

c 1<br />

49<br />

Compiler Principles


How to evaluate followpos<br />

• Two-rules define the function followpos:<br />

1. If n is concatenation-node with left child c 1 <strong>and</strong><br />

right child c 2 , <strong>and</strong> i is a position in lastpos(c 1 ),<br />

then all positions in firstpos(c 2 ) are in<br />

followpos(i).<br />

2. If n is a star-node, <strong>and</strong> i is a position in<br />

lastpos(n), then all positions in firstpos(n) are<br />

in followpos(i).<br />

• If firstpos <strong>and</strong> lastpos have been computed<br />

for each node, followpos of each position<br />

can be computed by making one depth-first<br />

traversal of the syntax tree.<br />

50<br />

Compiler Principles


Example -- ( a | b) * a #<br />

{1}<br />

{1,2,3} {3} {4}#<br />

4<br />

{1,2}*<br />

{1,2}{3}<br />

a{3}<br />

3<br />

{1,2}<br />

a<br />

1<br />

{1}<br />

|<br />

{1,2,3}<br />

{1,2}<br />

<br />

{2} b {2}<br />

2<br />

{4}<br />

{4}<br />

red – firstpos<br />

blue – lastpos<br />

Then we can calculate followpos<br />

followpos(1) = {1,2,3}<br />

followpos(2) = {1,2,3}<br />

followpos(3) = {4}<br />

followpos(4) = {}<br />

• After we calculate follow positions, we are ready to create<br />

DFA for the regular expression.<br />

51<br />

Compiler Principles


Algorithm (RE DFA)<br />

1. Create the syntax tree of (r) #<br />

2. Calculate nullable, firstpos, lastpos, followpos<br />

3. Put firstpos(root) into the states of DFA as an unmarked state.<br />

4. while (there is an unmarked state S in the states of DFA) do<br />

– mark S<br />

– for each input symbol a do<br />

• let s 1 ,...,s n are positions in S <strong>and</strong> symbols in those positions are a<br />

• S’ followpos(s 1 ) ... followpos(s n )<br />

• Dtran[S,a] S’<br />

• if (S’ is not in the states of DFA)<br />

– put S’ into the states of DFA as an unmarked state.<br />

• the start state of DFA is firstpos(root)<br />

• the accepting states of DFA are all states containing the position of #<br />

52<br />

Compiler Principles


Example -- ( a | b) * a #<br />

followpos(1)={1,2,3} followpos(2)={1,2,3}<br />

followpos(3)={4} followpos(4)={}<br />

1 2 3 4<br />

S 1 =firstpos(root)={1,2,3}<br />

mark S 1<br />

a: followpos(1) followpos(3)={1,2,3,4}=S 2 Dtran[S 1 ,a]=S 2<br />

b: followpos(2)={1,2,3}=S 1 Dtran[S 1 ,b]=S 1<br />

mark S 2<br />

a: followpos(1) followpos(3)={1,2,3,4}=S 2 Dtran[S 2 ,a]=S 2<br />

b: followpos(2)={1,2,3}=S 1 Dtran[S 2 ,b]=S 1<br />

start state: S 1<br />

accepting states: {S 2 }<br />

b<br />

S 1<br />

a<br />

S 2<br />

a<br />

53<br />

b<br />

Compiler Principles


Example -- ( a | ) b c * #<br />

1 2 3 4<br />

followpos(1)={2} followpos(2)={3,4} followpos(3)={3,4}<br />

followpos(4)={}<br />

S 1 =firstpos(root)={1,2}<br />

mark S 1<br />

a: followpos(1)={2}=S 2 Dtran[S 1 ,a]=S 2<br />

b: followpos(2)={3,4}=S 3 Dtran[S 1 ,b]=S 3<br />

mark S 2<br />

b: followpos(2)={3,4}=S 3 Dtran[S 2 ,b]=S 3<br />

mark S 3<br />

c: followpos(3)={3,4}=S 3 Dtran[S 3 ,c]=S 3<br />

start state: S 1<br />

accepting states: {S 3 }<br />

S 1<br />

a<br />

b<br />

S 2<br />

b<br />

S 3<br />

c<br />

54<br />

Compiler Principles


Minimizing Number of DFA States<br />

• For any regular language, there is always a unique<br />

minimum state DFA, which can be constructed from<br />

any DFA of the language.<br />

• Algorithm:<br />

– Partition the set of states into two groups:<br />

• G 1 : set of accepting states<br />

• G 2 : set of non-accepting states<br />

– For each new group G<br />

• partition G into subgroups such that states s 1 <strong>and</strong> s 2 are in the<br />

same group iff<br />

for all input symbols a, states s 1 <strong>and</strong> s 2 have transitions to states<br />

in the same group.<br />

– Start state of the minimized DFA is the group containing<br />

the start state of the original DFA.<br />

– Accepting states of the minimized DFA are the groups<br />

containing the accepting states of the original DFA.<br />

55<br />

Compiler Principles


Minimizing DFA – Example (1)<br />

1<br />

a<br />

b<br />

a<br />

2<br />

b<br />

3<br />

a<br />

G 1 = {2}<br />

G 2 = {1,3}<br />

G 2 cannot be partitioned because<br />

Dtran[1,a]=2<br />

Dtran[3,a]=2<br />

Dtran[1,b]=3<br />

Dtran[3,b]=3<br />

b<br />

So, the minimized DFA (with minimum states) is<br />

b<br />

a<br />

1<br />

a<br />

b<br />

2<br />

56<br />

Compiler Principles


Minimizing DFA – Example (2)<br />

a<br />

a<br />

2<br />

a<br />

1 b 4<br />

b a<br />

3 b<br />

b<br />

a<br />

Minimized DFA<br />

1<br />

b<br />

Groups: {1,2,3} {4}<br />

{1,2} {3}<br />

no more partitioning<br />

b<br />

2<br />

a<br />

b<br />

a b<br />

1->2 1->3<br />

2->2 2->3<br />

3->4 3->3<br />

57<br />

a<br />

3<br />

Compiler Principles<br />

57


Architecture of A Lexical Analyzer<br />

58<br />

Compiler Principles<br />

58


An NFA for Lex program<br />

• Create an NFA for each<br />

regular expression<br />

• Combine all the NFAs into<br />

one<br />

• Introduce a new start<br />

state<br />

• Connect it with ε-<br />

transitions to the start<br />

states of the NFAs<br />

59<br />

Compiler Principles


Pattern Matching with NFA<br />

1 The lexical analyzer reads<br />

in input <strong>and</strong> calculates the<br />

set of states it is in at each<br />

symbol.<br />

2 Eventually, it reach a point<br />

with no next state.<br />

3 It looks backwards in the<br />

sequence of sets of<br />

states, until it finds a set<br />

including one or more<br />

accepting states.<br />

4 It picks the one associated<br />

with the earliest pattern in<br />

the list from the Lex<br />

program.<br />

5 It performs the associated<br />

action of the pattern.<br />

60<br />

Compiler Principles


Pattern Matching with NFA -- Example<br />

Input: aaba<br />

61<br />

Compiler Principles<br />

Report pattern: a*b +


Pattern Matching with DFA<br />

1 Convert the NFA for all the<br />

patterns into an equivalent<br />

DFA. For each DFA state<br />

with more than one<br />

accepting NFA states,<br />

choose the pattern, who is<br />

defined earliest, the output<br />

of the DFA state.<br />

2 Simulate the DFA until<br />

there is no next state.<br />

3 Trace back to the nearest<br />

accepting DFA state, <strong>and</strong><br />

perform the associated<br />

action.<br />

Input: abba<br />

0137 247 58 68<br />

Report pattern abb<br />

62<br />

Compiler Principles


Summary<br />

• How lexical analyzers work<br />

– Convert REs to NFA<br />

– Convert NFA to DFA<br />

– Minimize DFA<br />

– Use the minimized DFA to recognize tokens<br />

in the input<br />

– Use priorities, longest matching rule<br />

63<br />

Compiler Principles


• Exercise 3.7.1 (c)<br />

• Exercise 3.7.3 (c)<br />

• Exercise 3.9.4 (a)<br />

Homework<br />

• Due date: Oct. 9, 2014 (Monday)<br />

64<br />

Compiler Principles

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!