16.01.2014 Views

Beginning Python - From Novice to Professional

Beginning Python - From Novice to Professional

Beginning Python - From Novice to Professional

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

CHAPTER 10 ■ BATTERIES INCLUDED 237<br />

SPECIAL CHARACTERS IN CHARACTER SETS<br />

In general, special characters such as dots, asterisks, and question marks have <strong>to</strong> be escaped with a backslash<br />

if you want them <strong>to</strong> appear as literal characters in the pattern, rather than function as regexp opera<strong>to</strong>rs. Inside<br />

character sets, escaping these characters is generally not necessary (although perfectly legal). You should,<br />

however, keep in mind the following rules:<br />

You do have <strong>to</strong> escape the caret (^) if it appears at the beginning of the character set unless you want it<br />

<strong>to</strong> function as a negation opera<strong>to</strong>r. (In other words, don’t place it at the beginning unless you mean it.)<br />

Similarly, the right bracket (]) and the dash (-) must be put either at the beginning of the character set<br />

or escaped with a backslash. (Actually, the dash may also be put at the end, if you wish.)<br />

Alternatives and Subpatterns<br />

Character sets are nice when you let each letter vary independently, but what if you want <strong>to</strong><br />

match only the strings 'python' and 'perl'? You can’t specify such a specific pattern with<br />

character sets or wildcards. Instead, you use the special character for alternatives: the “pipe”<br />

character (|). So, your pattern would be 'python|perl'.<br />

However, sometimes you don’t want <strong>to</strong> use the choice opera<strong>to</strong>r on the entire pattern—just<br />

a part of it. To do that, you enclose the part, or subpattern, in parentheses. The previous example<br />

could be rewritten as 'p(ython|erl)'. (Note that the term subpattern can also be used about a<br />

single character.)<br />

Optional and Repeated Subpatterns<br />

By adding a question mark after a subpattern, you make it optional. It may appear in the matched<br />

string, but it isn’t strictly required. So, for example, the (slightly unreadable) pattern<br />

r'(http://)?(www\.)?python\.org'<br />

would match all of the following strings (and nothing else):<br />

'http://www.python.org'<br />

'http://python.org'<br />

'www.python.org'<br />

'python.org'<br />

A few things are worth noting here:<br />

• I’ve escaped the dots, <strong>to</strong> prevent them from functioning as wildcards.<br />

• I’ve used a raw string <strong>to</strong> reduce the number of backslashes needed.<br />

• Each optional subpattern is enclosed in parentheses.<br />

• The optional subpatterns may appear or not, independently of each other.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!