Beginning Python - From Novice to Professional
Beginning Python - From Novice to Professional Beginning Python - From Novice to Professional
CHAPTER 10 ■ BATTERIES INCLUDED 239 Table 10-8. Some Important Functions in the re Module Function compile(pattern[, flags]) search(pattern, string[, flags]) match(pattern, string[, flags]) split(pattern, string[, maxsplit=0]) findall(pattern, string) sub(pat, repl, string[, count=0]) escape(string) Description Creates a pattern object from a string with a regexp Searches for pattern in string Matches pattern at the beginning of string Splits a string by occurrences of pattern Returns a list of all occurrences of pattern in string Substitutes occurrences of pat in string with repl Escapes all special regexp characters in string The function re.compile transforms a regular expression (written as a string) to a pattern object, which can be used for more efficient matching. If you use regular expressions represented as strings when you call functions such as search or match, they have to be transformed into regular expression objects internally anyway. By doing this once, with the compile function, this step is no longer necessary each time you use the pattern. The pattern objects have the searching/matching functions as methods, so re.search(pat, string) (where pat is a regexp written as a string) is equivalent to pat.search(string) (where pat is a pattern object created with compile). Compiled regexp objects can also be used in the normal re functions. The function re.search searches a given string to find the first substring, if any, that matches the given regular expression. If one is found, a MatchObject (evaluating to true) is returned; otherwise None (evaluating to false) is returned. Due to the nature of the return values, the function can be used in conditional statements, such as if re.search(pat, string): print 'Found it!' However, if you need more information about the matched substring, you can examine the returned MatchObject. (More about MatchObjects in the next section.) The function re.match tries to match a regular expression at the beginning of a given string. So match('p', 'python') returns true, while match('p', 'www.python.org') returns false. (The return values are the same as those for search.) ■Note The match function will report a match if the pattern matches the beginning of a string; the pattern is not required to match the entire string. If you want to do that, you have to add a dollar sign to the end of your pattern; the dollar sign will match the end of the string and thereby “stretch out” the match.
240 CHAPTER 10 ■ BATTERIES INCLUDED The function re.split splits a string by the occurrences of a pattern. This is similar to the string method split, except that you allow full regular expressions instead of only a fixed separator string. For example, with the string method split you could split a string by the occurrences of the string ', ' but with re.split you can split on any sequence of space characters and commas: >>> some_text = 'alpha, beta,,,,gamma delta' >>> re.split('[, ]+', some_text) ['alpha', 'beta', 'gamma', 'delta'] ■Note If the pattern contains parentheses, the parenthesized groups are interspersed between the split substrings. As you can see from this example, the return value is a list of substrings. The maxsplit argument indicates the maximum number of splits allowed: >>> re.split('[, ]+', some_text, maxsplit=2) ['alpha', 'beta', 'gamma delta'] >>> re.split('[, ]+', some_text, maxsplit=1) ['alpha', 'beta,,,,gamma delta'] The function re.findall returns a list of all occurrences of the given pattern. For example, to find all words in a string, you could do the following: >>> pat = '[a-zA-Z]+' >>> text = '"Hm... Err -- are you sure?" he said, sounding insecure.' >>> re.findall(pat, text) ['Hm', 'Err', 'are', 'you', 'sure', 'he', 'said', 'sounding', 'insecure'] Or, you could find the punctuation: >>> pat = r'[.?\-",]+' >>> re.findall(pat, text) ['"', '...', '--', '?"', ',', '.'] Note that the dash (-) has been escaped so Python won’t interpret it as part of a character range (such as a–z). The function re.sub is used to substitute the leftmost, nonoverlapping occurrences of a pattern with a given replacement. Consider the following example: >>> pat = '{name}' >>> text = 'Dear {name}...' >>> re.sub(pat, 'Mr. Gumby', text) 'Dear Mr. Gumby...' See the section “Using Group Numbers and Functions in Substitutions” later in this chapter for information on how to use this function more effectively.
- Page 220 and 221: CHAPTER 9 ■ MAGIC METHODS, PROPER
- Page 222 and 223: CHAPTER 9 ■ MAGIC METHODS, PROPER
- Page 224 and 225: CHAPTER 9 ■ MAGIC METHODS, PROPER
- Page 226 and 227: CHAPTER 9 ■ MAGIC METHODS, PROPER
- Page 228 and 229: CHAPTER 9 ■ MAGIC METHODS, PROPER
- Page 230 and 231: CHAPTER 9 ■ MAGIC METHODS, PROPER
- Page 232 and 233: CHAPTER 9 ■ MAGIC METHODS, PROPER
- Page 234 and 235: CHAPTER 10 ■ ■ ■ Batteries In
- Page 236 and 237: CHAPTER 10 ■ BATTERIES INCLUDED 2
- Page 238 and 239: CHAPTER 10 ■ BATTERIES INCLUDED 2
- Page 240 and 241: CHAPTER 10 ■ BATTERIES INCLUDED 2
- Page 242 and 243: CHAPTER 10 ■ BATTERIES INCLUDED 2
- Page 244 and 245: CHAPTER 10 ■ BATTERIES INCLUDED 2
- Page 246 and 247: CHAPTER 10 ■ BATTERIES INCLUDED 2
- Page 248 and 249: CHAPTER 10 ■ BATTERIES INCLUDED 2
- Page 250 and 251: CHAPTER 10 ■ BATTERIES INCLUDED 2
- Page 252 and 253: CHAPTER 10 ■ BATTERIES INCLUDED 2
- Page 254 and 255: CHAPTER 10 ■ BATTERIES INCLUDED 2
- Page 256 and 257: CHAPTER 10 ■ BATTERIES INCLUDED 2
- Page 258 and 259: CHAPTER 10 ■ BATTERIES INCLUDED 2
- Page 260 and 261: CHAPTER 10 ■ BATTERIES INCLUDED 2
- Page 262 and 263: CHAPTER 10 ■ BATTERIES INCLUDED 2
- Page 264 and 265: CHAPTER 10 ■ BATTERIES INCLUDED 2
- Page 266 and 267: CHAPTER 10 ■ BATTERIES INCLUDED 2
- Page 268 and 269: CHAPTER 10 ■ BATTERIES INCLUDED 2
- Page 272 and 273: CHAPTER 10 ■ BATTERIES INCLUDED 2
- Page 274 and 275: CHAPTER 10 ■ BATTERIES INCLUDED 2
- Page 276 and 277: CHAPTER 10 ■ BATTERIES INCLUDED 2
- Page 278 and 279: CHAPTER 10 ■ BATTERIES INCLUDED 2
- Page 280 and 281: CHAPTER 10 ■ BATTERIES INCLUDED 2
- Page 282 and 283: CHAPTER 10 ■ BATTERIES INCLUDED 2
- Page 284: CHAPTER 10 ■ BATTERIES INCLUDED 2
- Page 287 and 288: 256 CHAPTER 11 ■ FILES AND STUFF
- Page 289 and 290: 258 CHAPTER 11 ■ FILES AND STUFF
- Page 291 and 292: 260 CHAPTER 11 ■ FILES AND STUFF
- Page 293 and 294: 262 CHAPTER 11 ■ FILES AND STUFF
- Page 295 and 296: 264 CHAPTER 11 ■ FILES AND STUFF
- Page 297 and 298: 266 CHAPTER 11 ■ FILES AND STUFF
- Page 299 and 300: 268 CHAPTER 11 ■ FILES AND STUFF
- Page 301 and 302: 270 CHAPTER 12 ■ GRAPHICAL USER I
- Page 303 and 304: 272 CHAPTER 12 ■ GRAPHICAL USER I
- Page 305 and 306: 274 CHAPTER 12 ■ GRAPHICAL USER I
- Page 307 and 308: 276 CHAPTER 12 ■ GRAPHICAL USER I
- Page 309 and 310: 278 CHAPTER 12 ■ GRAPHICAL USER I
- Page 311 and 312: 280 CHAPTER 12 ■ GRAPHICAL USER I
- Page 313 and 314: 282 CHAPTER 12 ■ GRAPHICAL USER I
- Page 316 and 317: CHAPTER 13 ■ ■ ■ Database Sup
- Page 318 and 319: CHAPTER 13 ■ DATABASE SUPPORT 287
CHAPTER 10 ■ BATTERIES INCLUDED 239<br />
Table 10-8. Some Important Functions in the re Module<br />
Function<br />
compile(pattern[, flags])<br />
search(pattern, string[, flags])<br />
match(pattern, string[, flags])<br />
split(pattern, string[, maxsplit=0])<br />
findall(pattern, string)<br />
sub(pat, repl, string[, count=0])<br />
escape(string)<br />
Description<br />
Creates a pattern object from a string with<br />
a regexp<br />
Searches for pattern in string<br />
Matches pattern at the beginning of string<br />
Splits a string by occurrences of pattern<br />
Returns a list of all occurrences of pattern<br />
in string<br />
Substitutes occurrences of pat in string with repl<br />
Escapes all special regexp characters in string<br />
The function re.compile transforms a regular expression (written as a string) <strong>to</strong> a pattern<br />
object, which can be used for more efficient matching. If you use regular expressions represented<br />
as strings when you call functions such as search or match, they have <strong>to</strong> be transformed in<strong>to</strong><br />
regular expression objects internally anyway. By doing this once, with the compile function,<br />
this step is no longer necessary each time you use the pattern. The pattern objects have the<br />
searching/matching functions as methods, so re.search(pat, string) (where pat is a regexp<br />
written as a string) is equivalent <strong>to</strong> pat.search(string) (where pat is a pattern object created<br />
with compile). Compiled regexp objects can also be used in the normal re functions.<br />
The function re.search searches a given string <strong>to</strong> find the first substring, if any, that matches<br />
the given regular expression. If one is found, a MatchObject (evaluating <strong>to</strong> true) is returned;<br />
otherwise None (evaluating <strong>to</strong> false) is returned. Due <strong>to</strong> the nature of the return values, the function<br />
can be used in conditional statements, such as<br />
if re.search(pat, string):<br />
print 'Found it!'<br />
However, if you need more information about the matched substring, you can examine<br />
the returned MatchObject. (More about MatchObjects in the next section.)<br />
The function re.match tries <strong>to</strong> match a regular expression at the beginning of a given string.<br />
So match('p', 'python') returns true, while match('p', 'www.python.org') returns false. (The<br />
return values are the same as those for search.)<br />
■Note The match function will report a match if the pattern matches the beginning of a string; the<br />
pattern is not required <strong>to</strong> match the entire string. If you want <strong>to</strong> do that, you have <strong>to</strong> add a dollar sign <strong>to</strong> the<br />
end of your pattern; the dollar sign will match the end of the string and thereby “stretch out” the match.