Intel XENIX 286 Programmers Guide (86) - Tenox.tc

Intel XENIX 286 Programmers Guide (86) - Tenox.tc Intel XENIX 286 Programmers Guide (86) - Tenox.tc

09.06.2013 Views

XENIX Programming lex: Lexical Analyzer Generator The library is accessed by the linker flag -11. So an appropriate set of commands is lex source cc lex.yy.c -II The resulting program is placed in the usual file a.out for later execution. To use lex with yacc see the section "lex and yacc" later in this chapter and also Chapter 10, "yacc: Compiler-Compiler." Although the default lex I/0 routines use the C standard library, the lex automata themselves do not do so. If private versions of input, output, and unput are given, the standard C library can be avoided. Specifying Character Classes Classes of characters can be specified using brackets: [ and ]. The construction [abc] matches a single character, which may be a, b, or c. Within square brackets, most operator meanings are ignored. Only three characters are special: the backslash (\}, the hyphen (-), and the caret ("). The hyphen indicates ranges. For example [a-z0-9< > ] indicates the character class containing all the lowercase letters, the digits, the angle brackets, and underscore. Ranges may be given in either ascending or descending order. Using the hyphen between any pair of characters that are not both uppercase letters, both lowercase letters, or both digits is implementation dependent and causes a warning message. If you want the hyphen in a character class, it should be first or last; thus [- + 0-9] matches all the digits and the plus and min us signs. In character classes, the caret (") operator must appear as the first character after the left bracket; it indicates that the resulting string is to be complemented with respect to the computer character set. Thus ["'abc] matches all characters except a, b, or c, including all special or control characters; or ["' a-zA-Z] is any character that is not a letter. The backslash (\) provides an escape mechanism within character class brackets, so that characters can be entered literally by preceding them with this character. Escaping into octal is possible although nonportable. For example [ \40-\ 1 76] matches all printable characters in the ASCII character set, from octal 40 (blank) to octal 176 (tilde). 9-5

lex: Lexical Analyzer Generator XENIX Programming Specifying an Arbitrary Character To match almost any character, the period (.) designates the class of all characters except a newline. Specifying Optional Expressions The question mark (?) operator indicates an optional element of an expression. Thus ab?c matches either ac or abc. Note that the meaning of the question mark here differs from its meaning in the shell. Specifying Repeated Expressions Repetitions of classes are indicated by the asterisk (*) and plus (+) operators. For example a* matches any nu mber of consecutive a characters, including zero, while a+ matches one or more instances of a. For example [a-z] + matches all strings of lowercase letters, and [A-Za-z][A-Za-z0-9]* matches all alphanumeric strings with a leading alphabetic character; this is a typical expression for recognizing identifiers in computer languages. 9-6

<strong>XENIX</strong> Programming lex: Lexical Analyzer Generator<br />

The library is accessed by the linker flag -11. So an appropriate set of commands is<br />

lex source<br />

cc lex.yy.c -II<br />

The resulting program is placed in the usual file a.out for later execution. To use lex<br />

with yacc see the section "lex and yacc" later in this chapter and also Chapter 10,<br />

"yacc: Compiler-Compiler." Although the default lex I/0 routines use the C standard<br />

library, the lex automata themselves do not do so. If private versions of input, output,<br />

and unput are given, the standard C library can be avoided.<br />

Specifying Character Classes<br />

Classes of characters can be specified using brackets: [ and ]. The construction<br />

[abc]<br />

ma<strong>tc</strong>hes a single character, which may be a, b, or c. Within square brackets, most<br />

operator meanings are ignored. Only three characters are special: the backslash (\}, the<br />

hyphen (-), and the caret ("). The hyphen indicates ranges. For example<br />

[a-z0-9< > ]<br />

indicates the character class containing all the lowercase letters, the digits, the angle<br />

brackets, and underscore. Ranges may be given in either ascending or descending order.<br />

Using the hyphen between any pair of characters that are not both uppercase letters,<br />

both lowercase letters, or both digits is implementation dependent and causes a warning<br />

message. If you want the hyphen in a character class, it should be first or last; thus<br />

[- + 0-9]<br />

ma<strong>tc</strong>hes all the digits and the plus and min us signs.<br />

In character classes, the caret (") operator must appear as the first character after the<br />

left bracket; it indicates that the resulting string is to be complemented with respect to<br />

the computer character set. Thus<br />

["'abc]<br />

ma<strong>tc</strong>hes all characters except a, b, or c, including all special or control characters; or<br />

["' a-zA-Z]<br />

is any character that is not a letter. The backslash (\) provides an escape mechanism<br />

within character class brackets, so that characters can be entered literally by preceding<br />

them with this character. Escaping into octal is possible although nonportable. For<br />

example<br />

[ \40-\ 1 76]<br />

ma<strong>tc</strong>hes all printable characters in the ASCII character set, from octal 40 (blank) to<br />

octal 176 (tilde).<br />

9-5

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!