16.01.2014 Views

Beginning Python - From Novice to Professional

Beginning Python - From Novice to Professional

Beginning Python - From Novice to Professional

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

CHAPTER 10 ■ BATTERIES INCLUDED 241<br />

The function re.escape is a utility function used <strong>to</strong> escape all the characters in a string that<br />

might be interpreted as a regexp opera<strong>to</strong>r. Use this if you have a long string with lots of these<br />

special characters and you want <strong>to</strong> avoid typing a lot of backslashes, or if you get a string from<br />

a user (for example, through the raw_input function) and want <strong>to</strong> use it as a part of a regexp.<br />

Here is an example of how it works:<br />

>>> re.escape('www.python.org')<br />

'www\\.python\\.org'<br />

>>> re.escape('But where is the ambiguity?')<br />

'But\\ where\\ is\\ the\\ ambiguity\\?'<br />

■Note In Table 10-8 you’ll notice that some of the functions have an optional parameter called flags. This<br />

parameter can be used <strong>to</strong> change how the regular expressions are interpreted. For more information about<br />

this, see the standard library reference, in the section about the re module at http://python.org/doc/<br />

lib/module-re.html. The flags are described in the subsection “Module Contents.”<br />

Match Objects and Groups<br />

The re functions that try <strong>to</strong> match a pattern against a section of a string all return MatchObjects<br />

when a match is found. These objects contain information about the substring that matched<br />

the pattern. They also contain information about which parts of the pattern matched which<br />

parts of the substring—and these “parts” are called groups.<br />

A group is simply a subpattern that has been enclosed in parentheses. The groups are<br />

numbered by their left parenthesis. Group zero is the entire pattern. So, in the pattern<br />

'There (was a (wee) (cooper)) who (lived in Fyfe)'<br />

the groups are as follows:<br />

0 There was a wee cooper who lived in Fyfe<br />

1 was a wee cooper<br />

2 wee<br />

3 cooper<br />

4 lived in Fyfe<br />

Typically, the groups contain special characters such as wildcards or repetition opera<strong>to</strong>rs,<br />

and thus you may be interested in knowing what a given group has matched. For example, in<br />

the pattern<br />

r'www\.(.+)\.com$'<br />

group 0 would contain the entire string, and group 1 would contain everything between 'www.'<br />

and '.com'. By creating patterns like this, you can extract the parts of a string that interest you.<br />

Some of the more important methods of re match objects are described in Table 10-9.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!