Beginning Python - From Novice to Professional
Beginning Python - From Novice to Professional Beginning Python - From Novice to Professional
CHAPTER 23 ■ PROJECT 4: IN THE NEWS 447 When creating an NNTPSource, much of the code can be snipped from the original prototype. As you will see in later in Listing 23-2, the main differences from the original are the following: • The code has been encapsulated in the getItems method. The servername and group variables are now arguments to the constructor. Also a window (a time window) is added, instead of assuming that you want the news since yesterday (which is equivalent to setting window to 1). • To extract the subject, a Message object from the email module is used (constructed with the message_from_string function). This is the sort of thing you might add to later versions of your program as you thumb through the docs (as mentioned in Chapter 21, in the section “Using the LinePlot Class”). • Instead of printing each news item directly, a NewsItem object is yielded (making getItems a generator). To show the flexibility of the design, let’s add another news source—one that can extract news items from Web pages (using regular expressions; see Chapter 10 for more information). SimpleWebSource (see Listing 23-2) takes a URL and two regular expressions (one representing titles, and one representing bodies) as its constructor arguments. In getItems, it uses the regexp methods findall to find all the occurrences (titles and bodies) and zip to combine these. It then iterates over the list of (title, body) pairs, yielding a NewsItem for each. As you can see, adding new kinds of sources (or destinations, for that matter) isn’t very difficult. To put the code to work, let’s instantiate an agent, some sources, and some destinations. In the function runDefaultSetup (which is called if the module is run as a program), several such objects are instantiated: • A SimpleWebSource is constructed for the BBC News Web site. It uses two simple regular expressions to extract the information it needs. (Note that the layout of the HTML on these pages might change, in which case you need to rewrite the regexps. This also applies if you are using some other page—just view the HTML source and try to find a pattern that applies.) • An NNTPSource for comp.lang.python. The time window is set to 1, so it works just like the first prototype. • A PlainDestination, which prints all the news gathered. • An HTMLDestination, which generates a news page called news.html. When all of these objects have been created and added to the NewsAgent, the distribute method is called. You can run the program like this: $ python newsagent2.py The resulting news.html page is shown in Figure 23-2.
448 CHAPTER 23 ■ PROJECT 4: IN THE NEWS Figure 23-2. A news page with more than one source The full source code of the second implementation is found in Listing 23-2. Listing 23-2. A More Flexible News Gathering Agent (newsagent2.py) from __future__ import generators from nntplib import NNTP from time import strftime, time, localtime from email import message_from_string from urllib import urlopen import textwrap import re day = 24 * 60 * 60 # Number of seconds in one day def wrap(string, max=70): """ Wraps a string to a maximum line width. """ return '\n'.join(textwrap.wrap(string)) + '\n' class NewsAgent: """ An object that can distribute news items from news sources to news destinations. """
- Page 427 and 428: 396 CHAPTER 20 ■ PROJECT 1: INSTA
- Page 429 and 430: 398 CHAPTER 20 ■ PROJECT 1: INSTA
- Page 431 and 432: 400 CHAPTER 20 ■ PROJECT 1: INSTA
- Page 433 and 434: 402 CHAPTER 20 ■ PROJECT 1: INSTA
- Page 435 and 436: 404 CHAPTER 20 ■ PROJECT 1: INSTA
- Page 437 and 438: 406 CHAPTER 20 ■ PROJECT 1: INSTA
- Page 439 and 440: 408 CHAPTER 20 ■ PROJECT 1: INSTA
- Page 442 and 443: CHAPTER 21 ■ ■ ■ Project 2: P
- Page 444 and 445: CHAPTER 21 ■ PROJECT 2: PAINTING
- Page 446 and 447: CHAPTER 21 ■ PROJECT 2: PAINTING
- Page 448 and 449: CHAPTER 21 ■ PROJECT 2: PAINTING
- Page 450 and 451: CHAPTER 21 ■ PROJECT 2: PAINTING
- Page 452 and 453: CHAPTER 22 ■ ■ ■ Project 3: X
- Page 454 and 455: CHAPTER 22 ■ PROJECT 3: XML FOR A
- Page 456 and 457: CHAPTER 22 ■ PROJECT 3: XML FOR A
- Page 458 and 459: CHAPTER 22 ■ PROJECT 3: XML FOR A
- Page 460 and 461: CHAPTER 22 ■ PROJECT 3: XML FOR A
- Page 462 and 463: CHAPTER 22 ■ PROJECT 3: XML FOR A
- Page 464 and 465: CHAPTER 22 ■ PROJECT 3: XML FOR A
- Page 466 and 467: CHAPTER 22 ■ PROJECT 3: XML FOR A
- Page 468: CHAPTER 22 ■ PROJECT 3: XML FOR A
- Page 471 and 472: 440 CHAPTER 23 ■ PROJECT 4: IN TH
- Page 473 and 474: 442 CHAPTER 23 ■ PROJECT 4: IN TH
- Page 475 and 476: 444 CHAPTER 23 ■ PROJECT 4: IN TH
- Page 477: 446 CHAPTER 23 ■ PROJECT 4: IN TH
- Page 481 and 482: 450 CHAPTER 23 ■ PROJECT 4: IN TH
- Page 483 and 484: 452 CHAPTER 23 ■ PROJECT 4: IN TH
- Page 486 and 487: CHAPTER 24 ■ ■ ■ Project 5: A
- Page 488 and 489: CHAPTER 24 ■ PROJECT 5: A VIRTUAL
- Page 490 and 491: CHAPTER 24 ■ PROJECT 5: A VIRTUAL
- Page 492 and 493: CHAPTER 24 ■ PROJECT 5: A VIRTUAL
- Page 494 and 495: CHAPTER 24 ■ PROJECT 5: A VIRTUAL
- Page 496 and 497: CHAPTER 24 ■ PROJECT 5: A VIRTUAL
- Page 498 and 499: CHAPTER 24 ■ PROJECT 5: A VIRTUAL
- Page 500 and 501: CHAPTER 24 ■ PROJECT 5: A VIRTUAL
- Page 502 and 503: CHAPTER 24 ■ PROJECT 5: A VIRTUAL
- Page 504 and 505: CHAPTER 25 ■ ■ ■ Project 6: R
- Page 506 and 507: CHAPTER 25 ■ PROJECT 6: REMOTE ED
- Page 508 and 509: CHAPTER 25 ■ PROJECT 6: REMOTE ED
- Page 510 and 511: CHAPTER 25 ■ PROJECT 6: REMOTE ED
- Page 512: CHAPTER 25 ■ PROJECT 6: REMOTE ED
- Page 515 and 516: 484 CHAPTER 26 ■ PROJECT 7: YOUR
- Page 517 and 518: 486 CHAPTER 26 ■ PROJECT 7: YOUR
- Page 519 and 520: 488 CHAPTER 26 ■ PROJECT 7: YOUR
- Page 521 and 522: 490 CHAPTER 26 ■ PROJECT 7: YOUR
- Page 523 and 524: 492 CHAPTER 26 ■ PROJECT 7: YOUR
- Page 525 and 526: 494 CHAPTER 26 ■ PROJECT 7: YOUR
- Page 527 and 528: 496 CHAPTER 26 ■ PROJECT 7: YOUR
CHAPTER 23 ■ PROJECT 4: IN THE NEWS 447<br />
When creating an NNTPSource, much of the code can be snipped from the original pro<strong>to</strong>type.<br />
As you will see in later in Listing 23-2, the main differences from the original are the following:<br />
• The code has been encapsulated in the getItems method. The servername and group<br />
variables are now arguments <strong>to</strong> the construc<strong>to</strong>r. Also a window (a time window) is added,<br />
instead of assuming that you want the news since yesterday (which is equivalent <strong>to</strong><br />
setting window <strong>to</strong> 1).<br />
• To extract the subject, a Message object from the email module is used (constructed with<br />
the message_from_string function). This is the sort of thing you might add <strong>to</strong> later versions<br />
of your program as you thumb through the docs (as mentioned in Chapter 21, in the<br />
section “Using the LinePlot Class”).<br />
• Instead of printing each news item directly, a NewsItem object is yielded (making<br />
getItems a genera<strong>to</strong>r).<br />
To show the flexibility of the design, let’s add another news source—one that can extract<br />
news items from Web pages (using regular expressions; see Chapter 10 for more information).<br />
SimpleWebSource (see Listing 23-2) takes a URL and two regular expressions (one representing<br />
titles, and one representing bodies) as its construc<strong>to</strong>r arguments. In getItems, it uses the regexp<br />
methods findall <strong>to</strong> find all the occurrences (titles and bodies) and zip <strong>to</strong> combine these. It<br />
then iterates over the list of (title, body) pairs, yielding a NewsItem for each. As you can see,<br />
adding new kinds of sources (or destinations, for that matter) isn’t very difficult.<br />
To put the code <strong>to</strong> work, let’s instantiate an agent, some sources, and some destinations.<br />
In the function runDefaultSetup (which is called if the module is run as a program), several<br />
such objects are instantiated:<br />
• A SimpleWebSource is constructed for the BBC News Web site. It uses two simple regular<br />
expressions <strong>to</strong> extract the information it needs. (Note that the layout of the HTML on<br />
these pages might change, in which case you need <strong>to</strong> rewrite the regexps. This also applies if<br />
you are using some other page—just view the HTML source and try <strong>to</strong> find a pattern that<br />
applies.)<br />
• An NNTPSource for comp.lang.python. The time window is set <strong>to</strong> 1, so it works just like the<br />
first pro<strong>to</strong>type.<br />
• A PlainDestination, which prints all the news gathered.<br />
• An HTMLDestination, which generates a news page called news.html.<br />
When all of these objects have been created and added <strong>to</strong> the NewsAgent, the distribute<br />
method is called.<br />
You can run the program like this:<br />
$ python newsagent2.py<br />
The resulting news.html page is shown in Figure 23-2.