Beginning Python - From Novice to Professional
Beginning Python - From Novice to Professional Beginning Python - From Novice to Professional
CHAPTER 22 ■ PROJECT 3: XML FOR ALL OCCASIONS 423 If this does not raise an exception, you’re all set. If it does, you have to install PyXML. First, download the PyXML package from http://sf.net/projects/pyxml. There you can find RPM packages for Linux, binary installers for Windows, and source distributions for other platforms. The RPMs are installed with rpm --install, and the binary Windows distribution is installed simply by executing it. The source distribution is installed through the standard Python installation mechanism, the Distribution Utilities (Distutils). Simply unpack the tar.gz file, change to the unpacked directory, and execute the following: $ python setup.py install You should now be able to use the XML tools. ■Tip There are plenty of XML tools for Python out there. One very interesting alternative to the “standard” PyXML framework is Fredrik Lundh’s ElementTree (and the C implementation, cElementTree), available from http://effbot.org/zone. It’s quite powerful and easy to use, and may well be worth a look if you’re serious about using XML in Python. Preparations Before you can write the program that processes your XML files, you need to design your XML format. What tags do you need, what attributes should they have, and which tags should go where? To find out, let’s first consider what it is you want your XML to describe. The main concepts are Web site, directory, page, name, title, and contents: • Web site. You won’t be storing any information about the Web site itself, so this is just the top-level element enclosing all the files and directories. • Directory. A directory is mainly a container for files and other directories. • Page. This is a single Web page. • Name. Both directories and Web pages need names—these will be used as directory names and file names as they will appear in the file system and the corresponding URLs. • Title. Each Web page should have a title (not the same as its file name). • Contents. Each Web page will also have some contents. You’ll just use plain XHTML to represent the contents here—that way, you can just pass it through to the final Web pages and let the browsers interpret it. In short: your document will consist of a single website element, containing several directory and page elements, each of the directory elements optionally containing more pages and directories. The directory and page elements will have an attribute called name, which will contain their name. In addition, the page tag has a title attribute. The page element contains XHTML code (of the type found inside the XHTML body tag). An example file is shown in Listing 22-1.
424 CHAPTER 22 ■ PROJECT 3: XML FOR ALL OCCASIONS Listing 22-1. A Simple Web Site Represented As an XML File (website.xml) Welcome to My Home Page Hi, there. My name is Mr. Gumby, and this is my home page. Here are some of my interests: Shouting Sleeping Eating Mr. Gumby's Shouting Page ... Mr. Gumby's Sleeping Page ... Mr. Gumby's Eating Page ... First Implementation At this point, we haven’t yet looked at how XML parsing works. The approach we are using here (called SAX) consists of writing a set of event handlers (just like in GUI programming) and then letting an existing XML parser call these handlers as it reads the XML document.
- Page 404 and 405: CHAPTER 18 ■ ■ ■ Packaging Yo
- Page 406 and 407: CHAPTER 18 ■ PACKAGING YOUR PROGR
- Page 408 and 409: CHAPTER 18 ■ PACKAGING YOUR PROGR
- Page 410 and 411: CHAPTER 18 ■ PACKAGING YOUR PROGR
- Page 412 and 413: CHAPTER 19 ■ ■ ■ Playful Prog
- Page 414 and 415: CHAPTER 19 ■ PLAYFUL PROGRAMMING
- Page 416 and 417: CHAPTER 19 ■ PLAYFUL PROGRAMMING
- Page 418 and 419: CHAPTER 19 ■ PLAYFUL PROGRAMMING
- Page 420: CHAPTER 19 ■ PLAYFUL PROGRAMMING
- Page 423 and 424: 392 CHAPTER 20 ■ PROJECT 1: INSTA
- Page 425 and 426: 394 CHAPTER 20 ■ PROJECT 1: INSTA
- Page 427 and 428: 396 CHAPTER 20 ■ PROJECT 1: INSTA
- Page 429 and 430: 398 CHAPTER 20 ■ PROJECT 1: INSTA
- Page 431 and 432: 400 CHAPTER 20 ■ PROJECT 1: INSTA
- Page 433 and 434: 402 CHAPTER 20 ■ PROJECT 1: INSTA
- Page 435 and 436: 404 CHAPTER 20 ■ PROJECT 1: INSTA
- Page 437 and 438: 406 CHAPTER 20 ■ PROJECT 1: INSTA
- Page 439 and 440: 408 CHAPTER 20 ■ PROJECT 1: INSTA
- Page 442 and 443: CHAPTER 21 ■ ■ ■ Project 2: P
- Page 444 and 445: CHAPTER 21 ■ PROJECT 2: PAINTING
- Page 446 and 447: CHAPTER 21 ■ PROJECT 2: PAINTING
- Page 448 and 449: CHAPTER 21 ■ PROJECT 2: PAINTING
- Page 450 and 451: CHAPTER 21 ■ PROJECT 2: PAINTING
- Page 452 and 453: CHAPTER 22 ■ ■ ■ Project 3: X
- Page 456 and 457: CHAPTER 22 ■ PROJECT 3: XML FOR A
- Page 458 and 459: CHAPTER 22 ■ PROJECT 3: XML FOR A
- Page 460 and 461: CHAPTER 22 ■ PROJECT 3: XML FOR A
- Page 462 and 463: CHAPTER 22 ■ PROJECT 3: XML FOR A
- Page 464 and 465: CHAPTER 22 ■ PROJECT 3: XML FOR A
- Page 466 and 467: CHAPTER 22 ■ PROJECT 3: XML FOR A
- Page 468: CHAPTER 22 ■ PROJECT 3: XML FOR A
- Page 471 and 472: 440 CHAPTER 23 ■ PROJECT 4: IN TH
- Page 473 and 474: 442 CHAPTER 23 ■ PROJECT 4: IN TH
- Page 475 and 476: 444 CHAPTER 23 ■ PROJECT 4: IN TH
- Page 477 and 478: 446 CHAPTER 23 ■ PROJECT 4: IN TH
- Page 479 and 480: 448 CHAPTER 23 ■ PROJECT 4: IN TH
- Page 481 and 482: 450 CHAPTER 23 ■ PROJECT 4: IN TH
- Page 483 and 484: 452 CHAPTER 23 ■ PROJECT 4: IN TH
- Page 486 and 487: CHAPTER 24 ■ ■ ■ Project 5: A
- Page 488 and 489: CHAPTER 24 ■ PROJECT 5: A VIRTUAL
- Page 490 and 491: CHAPTER 24 ■ PROJECT 5: A VIRTUAL
- Page 492 and 493: CHAPTER 24 ■ PROJECT 5: A VIRTUAL
- Page 494 and 495: CHAPTER 24 ■ PROJECT 5: A VIRTUAL
- Page 496 and 497: CHAPTER 24 ■ PROJECT 5: A VIRTUAL
- Page 498 and 499: CHAPTER 24 ■ PROJECT 5: A VIRTUAL
- Page 500 and 501: CHAPTER 24 ■ PROJECT 5: A VIRTUAL
- Page 502 and 503: CHAPTER 24 ■ PROJECT 5: A VIRTUAL
CHAPTER 22 ■ PROJECT 3: XML FOR ALL OCCASIONS 423<br />
If this does not raise an exception, you’re all set. If it does, you have <strong>to</strong> install PyXML. First,<br />
download the PyXML package from http://sf.net/projects/pyxml. There you can find RPM<br />
packages for Linux, binary installers for Windows, and source distributions for other platforms.<br />
The RPMs are installed with rpm --install, and the binary Windows distribution is installed<br />
simply by executing it. The source distribution is installed through the standard <strong>Python</strong> installation<br />
mechanism, the Distribution Utilities (Distutils). Simply unpack the tar.gz file, change<br />
<strong>to</strong> the unpacked direc<strong>to</strong>ry, and execute the following:<br />
$ python setup.py install<br />
You should now be able <strong>to</strong> use the XML <strong>to</strong>ols.<br />
■Tip There are plenty of XML <strong>to</strong>ols for <strong>Python</strong> out there. One very interesting alternative <strong>to</strong> the “standard”<br />
PyXML framework is Fredrik Lundh’s ElementTree (and the C implementation, cElementTree), available from<br />
http://effbot.org/zone. It’s quite powerful and easy <strong>to</strong> use, and may well be worth a look if you’re<br />
serious about using XML in <strong>Python</strong>.<br />
Preparations<br />
Before you can write the program that processes your XML files, you need <strong>to</strong> design your XML<br />
format. What tags do you need, what attributes should they have, and which tags should go<br />
where? To find out, let’s first consider what it is you want your XML <strong>to</strong> describe.<br />
The main concepts are Web site, direc<strong>to</strong>ry, page, name, title, and contents:<br />
• Web site. You won’t be s<strong>to</strong>ring any information about the Web site itself, so this is just<br />
the <strong>to</strong>p-level element enclosing all the files and direc<strong>to</strong>ries.<br />
• Direc<strong>to</strong>ry. A direc<strong>to</strong>ry is mainly a container for files and other direc<strong>to</strong>ries.<br />
• Page. This is a single Web page.<br />
• Name. Both direc<strong>to</strong>ries and Web pages need names—these will be used as direc<strong>to</strong>ry<br />
names and file names as they will appear in the file system and the corresponding URLs.<br />
• Title. Each Web page should have a title (not the same as its file name).<br />
• Contents. Each Web page will also have some contents. You’ll just use plain XHTML <strong>to</strong><br />
represent the contents here—that way, you can just pass it through <strong>to</strong> the final Web<br />
pages and let the browsers interpret it.<br />
In short: your document will consist of a single website element, containing several direc<strong>to</strong>ry<br />
and page elements, each of the direc<strong>to</strong>ry elements optionally containing more pages and direc<strong>to</strong>ries.<br />
The direc<strong>to</strong>ry and page elements will have an attribute called name, which will contain<br />
their name. In addition, the page tag has a title attribute. The page element contains XHTML<br />
code (of the type found inside the XHTML body tag). An example file is shown in Listing 22-1.