16.01.2014 Views

Beginning Python - From Novice to Professional

Beginning Python - From Novice to Professional

Beginning Python - From Novice to Professional

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

428 CHAPTER 22 ■ PROJECT 3: XML FOR ALL OCCASIONS<br />

Creating HTML Pages<br />

Now you’re ready <strong>to</strong> make the pro<strong>to</strong>type. For now, let’s ignore the direc<strong>to</strong>ries and concentrate<br />

on creating HTML pages. You have <strong>to</strong> create a slightly embellished event handler that does<br />

the following:<br />

• At the start of each page element, opens a new file with the given name, and writes a suitable<br />

HTML header <strong>to</strong> it, including the given title<br />

• At the end of each page element, writes a suitable HTML footer <strong>to</strong> the file, and closes it<br />

• While inside the page element, passes through all tags and characters without modifying<br />

them (writes them <strong>to</strong> the file as they are)<br />

• While not inside a page element, ignores all tags (such as website and direc<strong>to</strong>ry<br />

Most of this is pretty straightforward (at least if you know a bit about how HTML documents<br />

are constructed). There are two problems, however, which may not be completely obvious.<br />

First, you can’t simply “pass through” tags (write them directly <strong>to</strong> the HTML file you’re<br />

building) because you are given their names only (and possibly some attributes). You have <strong>to</strong><br />

reconstruct the tags (with angle brackets and so forth) yourself.<br />

Second, SAX itself gives you no way of knowing whether you are currently “inside” a page<br />

element. You have <strong>to</strong> keep track of that sort of thing yourself (as we did in the HeadlineHandler<br />

example). For this project, we’re only interested in whether or not <strong>to</strong> pass through tags and<br />

characters, so we’ll use a Boolean variable called passthrough, which we’ll update as we enter<br />

and leave the pages.<br />

See Listing 22-2 for the code for the simple program.<br />

Listing 22-2. A Simple Page Maker Script (pagemaker.py)<br />

from xml.sax.handler import ContentHandler<br />

from xml.sax import parse<br />

class PageMaker(ContentHandler):<br />

passthrough = False<br />

def startElement(self, name, attrs):<br />

if name == 'page':<br />

self.passthrough = True<br />

self.out = open(attrs['name'] + '.html', 'w')<br />

self.out.write('\n')<br />

self.out.write('%s\n' % attrs['title'])<br />

self.out.write('\n')<br />

elif self.passthrough:<br />

self.out.write('')

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!