Slides02 - Computer Science and Engineering
Slides02 - Computer Science and Engineering
Slides02 - Computer Science and Engineering
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
CS308 Compiler Principles<br />
Lexical Analyzer<br />
Fan Wu<br />
Department of <strong>Computer</strong> <strong>Science</strong> <strong>and</strong> <strong>Engineering</strong><br />
Shanghai Jiao Tong University
Lexical Analyzer<br />
• Lexical Analyzer reads the source program<br />
character by character to produce tokens.<br />
– strips out comments <strong>and</strong> whitespaces<br />
– returns a token when the parser asks for<br />
– correlates error messages with the source<br />
program<br />
2<br />
Compiler Principles
Token<br />
• A token is a pair of a token name <strong>and</strong> an optional<br />
attribute value.<br />
– Token name specifies the pattern of the token<br />
– Attribute stores the lexeme of the token<br />
• Tokens<br />
– Keyword: “begin”, “if”, “else”, …<br />
– Identifier: string of letters or digits, starting with a letter<br />
– Integer: a non-empty string of digits<br />
– Punctuation symbol: “,”, “;”, “(”, “)”, …<br />
• Regular expressions are widely used to specify<br />
patterns of the tokens.<br />
3<br />
Compiler Principles
Token Example<br />
4<br />
Compiler Principles
Terminology of Languages<br />
• Alphabet: a finite set of symbols<br />
– ASCII<br />
– Unicode<br />
• String: a finite sequence of symbols on an alphabet<br />
– is the empty string<br />
– |s| is the length of string s<br />
– Concatenation: xy represents x followed by y<br />
– Exponentiation: s n = s s s .. s ( n times) s 0 = <br />
• Language: a set of strings over some fixed alphabet<br />
– the empty set is a language<br />
– The set of well-formed C programs is a language<br />
7<br />
Compiler Principles
Operations on Languages<br />
• Union: L 1 L 2 = { s | s L 1 or s L 2 }<br />
• Concatenation: L 1 L 2 = { s 1 s 2 | s 1 L 1<br />
L 2 }<br />
<strong>and</strong> s 2 <br />
• (Kleene) Closure:<br />
• Positive Closure:<br />
L<br />
L<br />
*<br />
<br />
<br />
<br />
<br />
i 0<br />
<br />
<br />
i 1<br />
i<br />
L<br />
i<br />
L<br />
8<br />
Compiler Principles
Example<br />
• L 1 = {a,b,c,d} L 2 = {1,2}<br />
• L 1 L 2 = {a,b,c,d,1,2}<br />
• L 1 L 2 = {a1,a2,b1,b2,c1,c2,d1,d2}<br />
• L 1<br />
*<br />
= all strings using letters a,b,c,d including<br />
the empty string<br />
• L 1+ = all strings using letters a,b,c,d without<br />
the empty string<br />
9<br />
Compiler Principles
Regular Expressions<br />
• Regular expression is a representation of a<br />
language that can be built from the operators<br />
applied to the symbols of some alphabet.<br />
• A regular expression is built up of smaller<br />
regular expressions (using defining rules).<br />
• Each regular expression r denotes a<br />
language L(r).<br />
• A language denoted by a regular expression<br />
is called as a regular set.<br />
10<br />
Compiler Principles
Regular Expressions (Rules)<br />
Regular expressions over alphabet <br />
Reg. Expr<br />
<br />
a <br />
(r 1 ) | (r 2 ) L(r 1 ) L(r 2 )<br />
(r 1 ) (r 2 ) L(r 1 ) L(r 2 )<br />
(r) * (L(r)) *<br />
(r)<br />
L(r)<br />
Language it denotes<br />
L() = {}<br />
L(a) = {a}<br />
Extension<br />
(r) + = (r)(r) * (L(r)) +<br />
(r) = (r) | <br />
L(r) {} zero or one instance<br />
[a 1 -a n ] L(a 1 |a 2 |…|a n ) character class<br />
11<br />
Compiler Principles
Regular Expressions (cont.)<br />
• We may remove parentheses by using<br />
precedence rules:<br />
– * highest<br />
– concatenation second highest<br />
– | lowest<br />
• (a(b) * )|(c) ab * |c<br />
• Example:<br />
– = {0,1}<br />
– 0|1 => {0,1}<br />
– (0|1)(0|1) => {00,01,10,11}<br />
–0 * => { ,0,00,000,0000,....}<br />
– (0|1) * => all strings with 0 <strong>and</strong> 1, including the empty<br />
string<br />
12<br />
Compiler Principles
Regular Definitions<br />
• We can give names to regular expressions, <strong>and</strong><br />
use these names as symbols to define other<br />
regular expressions.<br />
13<br />
• A regular definition is a sequence of the<br />
definitions of the form:<br />
d 1 r 1 where d i is a innovative symbol <strong>and</strong><br />
d 2 r 2 r i is a regular expression over symbols<br />
… in {d 1 ,d 2 ,...,d i-1 }<br />
d n r n<br />
alphabet<br />
previously defined<br />
symbols<br />
Compiler Principles
Regular Definitions Example<br />
• Example: Identifiers in Pascal<br />
letter A | B | ... | Z | a | b | ... | z<br />
digit 0 | 1 | ... | 9<br />
id letter (letter | digit ) *<br />
– If we try to write the regular expression<br />
representing identifiers without using regular<br />
definitions, that regular expression will be<br />
complex.<br />
(A|...|Z|a|...|z) ( (A|...|Z|a|...|z) | (0|...|9) ) *<br />
14<br />
Compiler Principles
Grammar<br />
Regular Definitions<br />
15<br />
Compiler Principles
Transition Diagram<br />
• State: represents a condition that could<br />
occur during scanning<br />
– start/initial state:<br />
– accepting/final state: lexeme found<br />
– intermediate state:<br />
• Edge: directs from one state to another,<br />
labeled with one or a set of symbols<br />
16<br />
Compiler Principles
Transition Diagram for relop<br />
Transition Diagram for ``relop < | > |< = | >= | = | ’’<br />
17<br />
Compiler Principles
Transition-Diagram-Based Lexical Analyzer<br />
18<br />
Implementation of relop transition diagram<br />
Compiler Principles
Transition Diagram for Others<br />
A transition diagram for id's<br />
19<br />
A transition diagram for unsigned numbers<br />
Compiler Principles
Practice<br />
• Draw the transition diagram for recognizing<br />
the following regular expression<br />
a(a|b)*a<br />
a|b<br />
b<br />
a<br />
a a<br />
1 2 3<br />
a a<br />
1 2 3<br />
Nondeterministic<br />
b<br />
Deterministic<br />
20<br />
Compiler Principles
Finite Automata<br />
• A finite automaton is a recognizer that takes a<br />
string, <strong>and</strong> answers “yes” if the string matches a<br />
pattern of a specified language, <strong>and</strong> “no”<br />
otherwise.<br />
• Two kinds:<br />
– Nondeterministic finite automaton (NFA)<br />
• no restriction on the labels of their edges<br />
– Deterministic finite automaton (DFA)<br />
• exactly one edge with a distinguished symbol goes out of<br />
each state<br />
• Both NFA <strong>and</strong> DFA have the same capability<br />
• We may use NFA or DFA as lexical analyzer<br />
21<br />
Compiler Principles
Nondeterministic Finite Automaton (NFA)<br />
• A NFA consists of:<br />
– S: a set of states<br />
– Σ: a set of input symbols (alphabet)<br />
– A transition function: maps state-symbol pairs to sets of<br />
states<br />
– s 0 : a start (initial) state<br />
– F: a set of accepting states (final states)<br />
• NFA can be represented by a transition graph<br />
• Accepts a string x, if <strong>and</strong> only if there is a path from<br />
the starting state to one of accepting states such that<br />
edge labels along this path spell out x.<br />
• Remarks<br />
– The same symbol can label edges from one state to<br />
several different states<br />
– An edge may be labeled by ε, the empty string<br />
22<br />
Compiler Principles
NFA Example (1)<br />
The language recognized by this NFA is (a|b) * a b<br />
23<br />
Compiler Principles
NFA Example (2)<br />
NFA accepting aa* |bb*<br />
24<br />
Compiler Principles
Implementing an NFA<br />
S -closure({s 0 })<br />
c nextchar()<br />
while (c != eof) {<br />
begin<br />
S -closure(move(S,c))<br />
{ set all of states can be accessible<br />
from s 0 by -transitions }<br />
{ set of all states can be<br />
accessible from a state in S by a<br />
transition on c}<br />
c nextchar<br />
end<br />
if (SF != ) then { if S contains an accepting state }<br />
return “yes”<br />
else<br />
return “no”<br />
Subset Construction<br />
25<br />
Compiler Principles
Deterministic Finite Automaton (DFA)<br />
• A Deterministic Finite Automaton (DFA) is<br />
a special form of a NFA.<br />
– No state has ε- transition<br />
– For each symbol a <strong>and</strong> state s, there is at<br />
most one a labeled edge leaving s.<br />
start<br />
The language recognized by this DFA is also (a|b) * a b<br />
26<br />
Compiler Principles
Implementing a DFA<br />
s s 0 { start from the initial state }<br />
c nextchar { get the next character from the<br />
input string }<br />
while (c != eof) do { do until the end of the string }<br />
begin<br />
s move(s,c) { transition function }<br />
c nextchar<br />
end<br />
if (s in F) then { if s is an accepting state }<br />
return “yes”<br />
else<br />
return “no”<br />
28<br />
Compiler Principles
NFA vs. DFA<br />
Compactibility Readability Speed<br />
NFA Good Good Slow<br />
DFA Bad Bad Fast<br />
• DFAs are widely used to build lexical analyzers.<br />
30<br />
NFA<br />
DFA<br />
The language recognized (a|b) * a b<br />
Compiler Principles
Pop Quiz<br />
1) What are the languages presented by the two FAs<br />
(a)<br />
0 1 1 0<br />
1 2 3 4 5<br />
1 0 0 1<br />
0<br />
6<br />
0 0<br />
7 8 9<br />
1 1 1<br />
Solution: 01 strings with length 4, except 0110<br />
a a a a<br />
(b) 1 2 3 4 5<br />
31<br />
Solution: a(aaaaa)*<br />
a<br />
Compiler Principles<br />
31
Pop Quiz<br />
2) For a language only accepting characters from {0,1},<br />
please design a DFA which represents all strings containing<br />
three ‘0’s.<br />
Solution:<br />
1<br />
1 1 1<br />
1<br />
0 0 0<br />
2 3 4<br />
32<br />
Compiler Principles
Regular Expression NFA<br />
• McNaughton-Yamada-Thompson (MYT)<br />
construction<br />
– Simple <strong>and</strong> systematic<br />
– Construction starts from the simplest parts<br />
(alphabet symbols).<br />
– For a complex regular expression, subexpressions<br />
are combined to create its NFA.<br />
– Guarantees the resulting NFA will have<br />
exactly one final state, <strong>and</strong> one start state.<br />
33<br />
Compiler Principles
MYT Construction<br />
• Basic rules: for subexpressions with no<br />
operators<br />
– For expression <br />
start<br />
i<br />
<br />
f<br />
– For a symbol a in the alphabet <br />
start<br />
i<br />
a<br />
f<br />
34<br />
Compiler Principles
MYT Construction Cont’d<br />
• Inductive rules: for constructing larger<br />
NFAs from the NFAs of subexpressions<br />
(Let N(r 1 ) <strong>and</strong> N(r 2 ) denote NFAs for regular<br />
expressions r 1 <strong>and</strong> r 2 , respectively)<br />
– For regular expression r 1 | r 2<br />
start<br />
i<br />
<br />
N(r 1 )<br />
<br />
f<br />
<br />
N(r 2 )<br />
<br />
35<br />
Compiler Principles
MYT Construction Cont’d<br />
– For regular expression r 1 r 2<br />
start<br />
i N(r 1 ) N(r 2 ) f<br />
– For regular expression r *<br />
<br />
start<br />
i<br />
<br />
N(r)<br />
<br />
f<br />
<br />
36<br />
Compiler Principles
Example: (a|b) * a<br />
a:<br />
b:<br />
a<br />
b<br />
(a|b):<br />
<br />
<br />
a<br />
b<br />
<br />
<br />
<br />
(a|b) * :<br />
<br />
<br />
<br />
a<br />
b<br />
<br />
<br />
<br />
<br />
(a|b) * a:<br />
<br />
<br />
<br />
<br />
a<br />
b<br />
<br />
<br />
<br />
a<br />
<br />
37<br />
Compiler Principles<br />
37
Properties of the Constructed NFA<br />
1. N(r) has at most twice as many states as there<br />
are operators <strong>and</strong> oper<strong>and</strong>s in r.<br />
– This bound follows from the fact that each step of<br />
the algorithm creates at most two new states.<br />
2. N(r) has one start state <strong>and</strong> one accepting<br />
state. The accepting state has no outgoing<br />
transitions, <strong>and</strong> the start state has no incoming<br />
transitions.<br />
3. Each state of N(r) other than the accepting<br />
state has either one outgoing transition on a<br />
symbol in {} or two outgoing transitions,<br />
both on .<br />
38<br />
Compiler Principles
Conversion of an NFA to a DFA<br />
• Approach: Subset Construction<br />
– each state of the constructed DFA corresponds to<br />
a set / combination of NFA states<br />
• Details<br />
1 Create transition table Dtran for the DFA<br />
2 Insert -closure(s 0 ) to Dstates as initial state<br />
3 Pick a not visited state T in Dstates<br />
4 For each symbol a, Create state<br />
-closure(move(T, a)), <strong>and</strong> add it to Dstates <strong>and</strong><br />
Dtran<br />
5 Repeat step (3) <strong>and</strong> (4) until all states in<br />
Dstates are visited<br />
39<br />
Compiler Principles
The Subset Construction<br />
40<br />
Compiler Principles
NFA to DFA Example<br />
NFA for (a|b) * abb<br />
Transition table for DFA<br />
Equivalent DFA<br />
4<br />
41<br />
Compiler Principles
Regular Expression DFA<br />
• First, augment the given regular expression<br />
by concatenating a special symbol #<br />
r r# augmented regular expression<br />
• Second, create a syntax tree for the<br />
augmented regular expression.<br />
– All leaves are alphabet symbols (plus # <strong>and</strong> the<br />
empty string)<br />
– All inner nodes are operators<br />
• Third, number each alphabet symbol (plus #)<br />
(position numbers)<br />
44<br />
Compiler Principles
Regular Expression DFA Cont’d<br />
(a|b) * a (a|b) * a#<br />
augmented regular expression<br />
a<br />
1<br />
*<br />
|<br />
<br />
b<br />
2<br />
<br />
a<br />
3<br />
#<br />
4<br />
<br />
<br />
<br />
1<br />
2<br />
<br />
a<br />
b<br />
<br />
<br />
<br />
<br />
a<br />
3 4 # F<br />
Syntax tree of (a|b) * a#<br />
• each symbol is at a leaf<br />
• each symbol is numbered (positions)<br />
• inner nodes are operators<br />
45<br />
Compiler Principles
followpos<br />
Then we define the function followpos for the positions (positions<br />
assigned to leaves).<br />
followpos(i) -- the set of positions which can follow<br />
the position i in the strings generated by<br />
the augmented regular expression.<br />
Example: ( a | b) * a #<br />
1 2 3 4<br />
followpos(1) = {1,2,3}<br />
followpos(2) = {1,2,3}<br />
followpos(3) = {4}<br />
followpos(4) = {}<br />
followpos() is just defined for leaves,<br />
not defined for inner nodes.<br />
46<br />
Compiler Principles
firstpos, lastpos, nullable<br />
• To compute followpos, we need three more<br />
functions defined for the nodes (not just for<br />
leaves) of the syntax tree.<br />
– firstpos(n) -- the set of the positions of the first<br />
symbols of strings generated by the subexpression<br />
rooted by n.<br />
– lastpos(n) -- the set of the positions of the last<br />
symbols of strings generated by the subexpression<br />
rooted by n.<br />
– nullable(n) -- true if the empty string is a<br />
member of strings generated by the subexpression<br />
rooted by n; false otherwise<br />
47<br />
Compiler Principles
Usage of the Functions<br />
(a|b) * a (a|b) * a#<br />
augmented regular expression<br />
m<br />
*<br />
|<br />
n<br />
<br />
<br />
a<br />
3<br />
#<br />
4<br />
nullable(n) = false<br />
nullable(m) = true<br />
firstpos(n) = {1, 2, 3}<br />
a<br />
1<br />
b<br />
2<br />
lastpos(n) = {3}<br />
Syntax tree of (a|b) * a#<br />
48<br />
Compiler Principles
Computing nullable, firstpos, lastpos<br />
n nullable(n) firstpos(n) lastpos(n)<br />
leaf labeled true <br />
leaf labeled<br />
with position i<br />
false {i} {i}<br />
|<br />
c 1 c 2<br />
nullable(c 1 ) or<br />
nullable(c 2 )<br />
firstpos(c 1 ) firstpos(c 2 )<br />
lastpos(c 1 ) <br />
lastpos(c 2 )<br />
nullable(c 1 )<br />
c 1 c 2<br />
<strong>and</strong><br />
nullable(c 2 )<br />
if (nullable(c 1 ))<br />
firstpos(c 1 )firstpos(c 2 )<br />
else firstpos(c 1 )<br />
if (nullable(c 2 ))<br />
lastpos(c 1 )lastpos(c 2 )<br />
else lastpos(c 2 )<br />
*<br />
true firstpos(c 1 ) lastpos(c 1 )<br />
c 1<br />
49<br />
Compiler Principles
How to evaluate followpos<br />
• Two-rules define the function followpos:<br />
1. If n is concatenation-node with left child c 1 <strong>and</strong><br />
right child c 2 , <strong>and</strong> i is a position in lastpos(c 1 ),<br />
then all positions in firstpos(c 2 ) are in<br />
followpos(i).<br />
2. If n is a star-node, <strong>and</strong> i is a position in<br />
lastpos(n), then all positions in firstpos(n) are<br />
in followpos(i).<br />
• If firstpos <strong>and</strong> lastpos have been computed<br />
for each node, followpos of each position<br />
can be computed by making one depth-first<br />
traversal of the syntax tree.<br />
50<br />
Compiler Principles
Example -- ( a | b) * a #<br />
{1}<br />
{1,2,3} {3} {4}#<br />
4<br />
{1,2}*<br />
{1,2}{3}<br />
a{3}<br />
3<br />
{1,2}<br />
a<br />
1<br />
{1}<br />
|<br />
{1,2,3}<br />
{1,2}<br />
<br />
{2} b {2}<br />
2<br />
{4}<br />
{4}<br />
red – firstpos<br />
blue – lastpos<br />
Then we can calculate followpos<br />
followpos(1) = {1,2,3}<br />
followpos(2) = {1,2,3}<br />
followpos(3) = {4}<br />
followpos(4) = {}<br />
• After we calculate follow positions, we are ready to create<br />
DFA for the regular expression.<br />
51<br />
Compiler Principles
Algorithm (RE DFA)<br />
1. Create the syntax tree of (r) #<br />
2. Calculate nullable, firstpos, lastpos, followpos<br />
3. Put firstpos(root) into the states of DFA as an unmarked state.<br />
4. while (there is an unmarked state S in the states of DFA) do<br />
– mark S<br />
– for each input symbol a do<br />
• let s 1 ,...,s n are positions in S <strong>and</strong> symbols in those positions are a<br />
• S’ followpos(s 1 ) ... followpos(s n )<br />
• Dtran[S,a] S’<br />
• if (S’ is not in the states of DFA)<br />
– put S’ into the states of DFA as an unmarked state.<br />
• the start state of DFA is firstpos(root)<br />
• the accepting states of DFA are all states containing the position of #<br />
52<br />
Compiler Principles
Example -- ( a | b) * a #<br />
followpos(1)={1,2,3} followpos(2)={1,2,3}<br />
followpos(3)={4} followpos(4)={}<br />
1 2 3 4<br />
S 1 =firstpos(root)={1,2,3}<br />
mark S 1<br />
a: followpos(1) followpos(3)={1,2,3,4}=S 2 Dtran[S 1 ,a]=S 2<br />
b: followpos(2)={1,2,3}=S 1 Dtran[S 1 ,b]=S 1<br />
mark S 2<br />
a: followpos(1) followpos(3)={1,2,3,4}=S 2 Dtran[S 2 ,a]=S 2<br />
b: followpos(2)={1,2,3}=S 1 Dtran[S 2 ,b]=S 1<br />
start state: S 1<br />
accepting states: {S 2 }<br />
b<br />
S 1<br />
a<br />
S 2<br />
a<br />
53<br />
b<br />
Compiler Principles
Example -- ( a | ) b c * #<br />
1 2 3 4<br />
followpos(1)={2} followpos(2)={3,4} followpos(3)={3,4}<br />
followpos(4)={}<br />
S 1 =firstpos(root)={1,2}<br />
mark S 1<br />
a: followpos(1)={2}=S 2 Dtran[S 1 ,a]=S 2<br />
b: followpos(2)={3,4}=S 3 Dtran[S 1 ,b]=S 3<br />
mark S 2<br />
b: followpos(2)={3,4}=S 3 Dtran[S 2 ,b]=S 3<br />
mark S 3<br />
c: followpos(3)={3,4}=S 3 Dtran[S 3 ,c]=S 3<br />
start state: S 1<br />
accepting states: {S 3 }<br />
S 1<br />
a<br />
b<br />
S 2<br />
b<br />
S 3<br />
c<br />
54<br />
Compiler Principles
Minimizing Number of DFA States<br />
• For any regular language, there is always a unique<br />
minimum state DFA, which can be constructed from<br />
any DFA of the language.<br />
• Algorithm:<br />
– Partition the set of states into two groups:<br />
• G 1 : set of accepting states<br />
• G 2 : set of non-accepting states<br />
– For each new group G<br />
• partition G into subgroups such that states s 1 <strong>and</strong> s 2 are in the<br />
same group iff<br />
for all input symbols a, states s 1 <strong>and</strong> s 2 have transitions to states<br />
in the same group.<br />
– Start state of the minimized DFA is the group containing<br />
the start state of the original DFA.<br />
– Accepting states of the minimized DFA are the groups<br />
containing the accepting states of the original DFA.<br />
55<br />
Compiler Principles
Minimizing DFA – Example (1)<br />
1<br />
a<br />
b<br />
a<br />
2<br />
b<br />
3<br />
a<br />
G 1 = {2}<br />
G 2 = {1,3}<br />
G 2 cannot be partitioned because<br />
Dtran[1,a]=2<br />
Dtran[3,a]=2<br />
Dtran[1,b]=3<br />
Dtran[3,b]=3<br />
b<br />
So, the minimized DFA (with minimum states) is<br />
b<br />
a<br />
1<br />
a<br />
b<br />
2<br />
56<br />
Compiler Principles
Minimizing DFA – Example (2)<br />
a<br />
a<br />
2<br />
a<br />
1 b 4<br />
b a<br />
3 b<br />
b<br />
a<br />
Minimized DFA<br />
1<br />
b<br />
Groups: {1,2,3} {4}<br />
{1,2} {3}<br />
no more partitioning<br />
b<br />
2<br />
a<br />
b<br />
a b<br />
1->2 1->3<br />
2->2 2->3<br />
3->4 3->3<br />
57<br />
a<br />
3<br />
Compiler Principles<br />
57
Architecture of A Lexical Analyzer<br />
58<br />
Compiler Principles<br />
58
An NFA for Lex program<br />
• Create an NFA for each<br />
regular expression<br />
• Combine all the NFAs into<br />
one<br />
• Introduce a new start<br />
state<br />
• Connect it with ε-<br />
transitions to the start<br />
states of the NFAs<br />
59<br />
Compiler Principles
Pattern Matching with NFA<br />
1 The lexical analyzer reads<br />
in input <strong>and</strong> calculates the<br />
set of states it is in at each<br />
symbol.<br />
2 Eventually, it reach a point<br />
with no next state.<br />
3 It looks backwards in the<br />
sequence of sets of<br />
states, until it finds a set<br />
including one or more<br />
accepting states.<br />
4 It picks the one associated<br />
with the earliest pattern in<br />
the list from the Lex<br />
program.<br />
5 It performs the associated<br />
action of the pattern.<br />
60<br />
Compiler Principles
Pattern Matching with NFA -- Example<br />
Input: aaba<br />
61<br />
Compiler Principles<br />
Report pattern: a*b +
Pattern Matching with DFA<br />
1 Convert the NFA for all the<br />
patterns into an equivalent<br />
DFA. For each DFA state<br />
with more than one<br />
accepting NFA states,<br />
choose the pattern, who is<br />
defined earliest, the output<br />
of the DFA state.<br />
2 Simulate the DFA until<br />
there is no next state.<br />
3 Trace back to the nearest<br />
accepting DFA state, <strong>and</strong><br />
perform the associated<br />
action.<br />
Input: abba<br />
0137 247 58 68<br />
Report pattern abb<br />
62<br />
Compiler Principles
Summary<br />
• How lexical analyzers work<br />
– Convert REs to NFA<br />
– Convert NFA to DFA<br />
– Minimize DFA<br />
– Use the minimized DFA to recognize tokens<br />
in the input<br />
– Use priorities, longest matching rule<br />
63<br />
Compiler Principles
• Exercise 3.7.1 (c)<br />
• Exercise 3.7.3 (c)<br />
• Exercise 3.9.4 (a)<br />
Homework<br />
• Due date: Oct. 9, 2014 (Monday)<br />
64<br />
Compiler Principles