16.01.2014 Views

Beginning Python - From Novice to Professional

Beginning Python - From Novice to Professional

Beginning Python - From Novice to Professional

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

394 CHAPTER 20 ■ PROJECT 1: INSTANT MARKUP<br />

First Implementation<br />

One of the first things you need <strong>to</strong> do is split the text in<strong>to</strong> paragraphs. It’s obvious from Listing 20-1<br />

that the paragraphs are separated by one or more empty lines. A better word than “paragraph”<br />

might be “block” because this name can apply <strong>to</strong> headlines and list items as well.<br />

A simple way <strong>to</strong> find these blocks is <strong>to</strong> collect all the lines you encounter until you find an<br />

empty line, and then return the lines you have collected so far. That would be one block. Then,<br />

you could start all over again. You needn’t bother collecting empty lines, and you won’t return<br />

empty blocks (where you have encountered more than one empty line). Also, you should make<br />

sure that the last line of the file is empty; otherwise you won’t know when the last block is<br />

finished. (There are other ways of finding out, of course.)<br />

Listing 20-2 shows an implementation of this approach.<br />

Listing 20-2. A Text Block Genera<strong>to</strong>r (util.py)<br />

from __future__ import genera<strong>to</strong>rs<br />

def lines(file):<br />

for line in file: yield line<br />

yield '\n'<br />

def blocks(file):<br />

block = []<br />

for line in lines(file):<br />

if line.strip():<br />

block.append(line)<br />

elif block:<br />

yield ''.join(block).strip()<br />

block = []<br />

The lines genera<strong>to</strong>r is just a little utility that tucks an empty line on<strong>to</strong> the end of the file.<br />

The blocks genera<strong>to</strong>r implements the approach described. When a block is yielded, its lines are<br />

joined, and the resulting string is stripped, giving you a single string representing the block,<br />

with excessive whitespace at either end (such as list indentations or newlines) removed.<br />

■Note The from __future__ import genera<strong>to</strong>rs statement is necessary in <strong>Python</strong> version 2.2, but<br />

not in 2.3 or 2.4. See also the section “Avoiding Genera<strong>to</strong>rs” in Chapter 9.<br />

I’ve put the code in the file util.py, which means that you can import the utility genera<strong>to</strong>rs<br />

in your program later on.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!