Intel XENIX 286 Programmers Guide (86) - Tenox.tc
Intel XENIX 286 Programmers Guide (86) - Tenox.tc Intel XENIX 286 Programmers Guide (86) - Tenox.tc
XENIX Programming lex: Lexical Analyzer Generator Or you might want to treat this as = -a. To do this, just return the minus sign as well as the letter to the input. The following performs the interpretation: = -[a-zA-Z] { printf(" Operator ( = -) ambiguous\n"); yyl ess(yyl eng-2); action for = } Note that the expressions for the two cases might more easily be written = -/[A-Za-z] in the first case and = /-[A-Za-z] in the second: no backup would be required in the rule action. It is not necessary to recognize the whole identifier to observe the ambiguity. The possibility of =-3 however, makes =-![" \t\n] a still better rule. In addition to these routines, lex also permits access to the 1/0 routines it uses. They include • input returns the next input character. • output(c) writes the character c on the output. • unput(c) pushes the character c back onto the input stream to be read later by input. By default these routines are provided as macro definitions, but the user can override them and supply private versions. These routines define the relationship between external files and internal characters and must all be retained or modified consistently. They may be redefined, to cause input or output to be transmitted to or from strange places, including other programs or internal memory; but the character set used must be consistent in all routines; a value of zero returned by input must mean end-of-file; and the relationship between unput and input must be retained or the look-ahead will not work. lex does not look ahead at all if it does not have to, but every rule containing a slash (/) or ending in one of the following characters implies look-ahead: + * ? $ Look-ahead is also necessary to match an expression that is a prefix of another expression. The standard lex library imposes a 100-character limit on backup. 9-11
lex: Lexical Analyzer Generator XENIX Programming Another lex library routine that you sometimes want to redefine is yywrap, which is called whenever lex reaches an end-of-file. If yywrap returns a 1, lex continues with the normal wrapup on end of input. Sometimes, however, it is convenient to arrange for more input to arrive from a new source. In this case, the user should provide a yywrap that arranges for new input and returns 0. This instructs lex to continue processing. The default yywrap always returns 1. This routine is also a convenient place to print tables, sum maries, etc. at the end of a program. Note that you cannot write a normal rule that recognizes end-of-file; the only access to this condition is through yywrap. In fact, unless a private version of input is supplied, a file containing nulls cannot be handled, since a value of 0 returned by input is taken to be end-of-file. Hand ling Ambiguous Sou rce Rules lex can handle ambiguous specifications. When more than one expression can match the current input, lex chooses as follows: • The longest match is preferred. • Among rules that match the same number of characters, the first given rule is preferred. For example, suppose the following rules are given: integer [a-z] + keyword action ... ; identifier action . .. ; If the input is integers, it is taken as an identifier, because [a-z] + matches eight characters while integer matches only seven. If the input is integer, both rules match seven characters, and the keyword rule is selected because it was given first. Anything shorter (e.g., int) does not match the expression integer, so the identifier interpretation is used. The principle of preferring the longest match makes certain constructions dangerous, such as the following: 9-12 *
- Page 137 and 138: as: Assembler The combination rules
- Page 139 and 140: as: Assembler XENIX Programming Ins
- Page 141 and 142: as: Assembler Initial Value Directi
- Page 143 and 144: as: Assembler XENIX Programming int
- Page 145 and 146: as: A sse m bier sub subb test test
- Page 147 and 148: as: Assembler XENIX Programming lnt
- Page 149 and 150: as: Assembler XENIX Programming lnt
- Page 151 and 152: as: Assembler XENIX Programming Imm
- Page 153 and 154: as: A sse m bier XENIX Programming
- Page 156 and 157: CHAPTER 8 csh : C SHEll The C shell
- Page 158 and 159: XENIX Programming csh: C Shell Some
- Page 160 and 161: XENIX Programm ing *w 32 * q % !c -
- Page 162 and 163: XENIX Programming csh: C Shell the
- Page 164 and 165: XENIX Programming csh: C Shell Usin
- Page 166 and 167: XENIX Programming csh: C Shell Usin
- Page 168 and 169: XENIX Programming csh: C Shell The
- Page 170 and 171: XENIX Programming csh: C Shell Note
- Page 172 and 173: XENIX Programming switch ( string c
- Page 174 and 175: XENIX Programming csh: C Shell Star
- Page 176 and 177: XENIX Programming csh: C Shell Spec
- Page 178 and 179: CHAPTER 9 lex : LEXICAL ANA LYZER G
- Page 180 and 181: XENIX Programming lex: Lexical Anal
- Page 182 and 183: XENIX Programming lex: Lexical Anal
- Page 184 and 185: XENIX Programming lex: Lexical Anal
- Page 186 and 187: XENIX Programming lex: Lexical Anal
- Page 190 and 191: XENIX Programming lex: Lexical Anal
- Page 192 and 193: XENIX Programm ing lex: Lexical Ana
- Page 194 and 195: XENIX Programming Specifying Source
- Page 196 and 197: XENIX Programming lex: Lexical Anal
- Page 198 and 199: XENIX Programming lex: Lexical Anal
- Page 200 and 201: XENIX Programming The definitions s
- Page 202 and 203: CHAPTER 10 yacc: COMPILER-COMPILER
- Page 204 and 205: XENIX Programming yacc: Compiler-Co
- Page 206 and 207: XENIX Programming yacc: Compiler-Co
- Page 208 and 209: XENIX Programming yacc: Compiler-Co
- Page 210 and 211: XENIX Programming yacc: Compiler-Co
- Page 212 and 213: XENIX Programming yacc: Compiler-Co
- Page 214 and 215: XENIX Programming yacc: Compiler-Co
- Page 216 and 217: XENIX Programming yacc: Compiler-Co
- Page 218 and 219: XENIX Programming yacc: Compiler-Co
- Page 220 and 221: XENIX Programming yacc: Compiler-Co
- Page 222 and 223: XENIX Programming yacc: Compiler-Co
- Page 224 and 225: XENIX Programming yacc: Compiler-Co
- Page 226 and 227: XENIX Programming yacc: Compiler-Co
- Page 228 and 229: XENIX Programming yacc: Compiler-Co
- Page 230 and 231: XENIX Programming yacc: Compiler-Co
- Page 232 and 233: XENIX Programming yacc: Compiler-Co
- Page 234 and 235: XENIX Programming yacc: Compiler-Co
- Page 236 and 237: XENIX Programming %left %left %left
lex: Lexical Analyzer Generator <strong>XENIX</strong> Programming<br />
Another lex library routine that you sometimes want to redefine is yywrap, which is<br />
called whenever lex reaches an end-of-file. If yywrap returns a 1, lex continues with<br />
the normal wrapup on end of input. Sometimes, however, it is convenient to arrange for<br />
more input to arrive from a new source. In this case, the user should provide a yywrap<br />
that arranges for new input and returns 0. This instructs lex to continue processing.<br />
The default yywrap always returns 1.<br />
This routine is also a convenient place to print tables, sum maries, e<strong>tc</strong>. at the end of a<br />
program. Note that you cannot write a normal rule that recognizes end-of-file; the only<br />
access to this condition is through yywrap. In fact, unless a private version of input is<br />
supplied, a file containing nulls cannot be handled, since a value of 0 returned by input is<br />
taken to be end-of-file.<br />
Hand ling Ambiguous Sou rce Rules<br />
lex can handle ambiguous specifications. When more than one expression can ma<strong>tc</strong>h the<br />
current input, lex chooses as follows:<br />
• The longest ma<strong>tc</strong>h is preferred.<br />
• Among rules that ma<strong>tc</strong>h the same number of characters, the first given rule is<br />
preferred.<br />
For example, suppose the following rules are given:<br />
integer<br />
[a-z] +<br />
keyword action ... ;<br />
identifier action . .. ;<br />
If the input is integers, it is taken as an identifier, because<br />
[a-z] +<br />
ma<strong>tc</strong>hes eight characters while<br />
integer<br />
ma<strong>tc</strong>hes only seven. If the input is integer, both rules ma<strong>tc</strong>h seven characters, and the<br />
keyword rule is selected because it was given first. Anything shorter (e.g., int) does not<br />
ma<strong>tc</strong>h the expression integer, so the identifier interpretation is used.<br />
The principle of preferring the longest ma<strong>tc</strong>h makes certain constructions dangerous,<br />
such as the following:<br />
9-12<br />
*