Working with XML in POCO.
Working with XML in POCO. Working with XML in POCO.
XML Working with XML in POCO.
- Page 2 and 3: Overview > XML > Simple API for XML
- Page 4 and 5: XML - Vocabulary > XML declaration
- Page 6 and 7: XML Programming Interfaces > POCO s
- Page 8 and 9: Event-driven Parsing > SAX is an ev
- Page 10 and 11: SAX Interfaces > Attributes (access
- Page 12 and 13: void characters(const XMLChar ch[],
- Page 14 and 15: Peter Schojer 15/03/1976 Male
- Page 16 and 17: DTDHandler > handles DTD not handle
- Page 18 and 19: DTD Notation > allows you to includ
- Page 20 and 21: LexicalHandler > optional extension
- Page 22 and 23: SAX Parser Configuration > XMLReade
- Page 24 and 25: class MyHandler: public ContentHand
- Page 26 and 27: some text Document Element Tex
- Page 28 and 29: Navigating the DOM > Node has > par
- Page 30 and 31: #include "Poco/DOM/DOMParser.h" #in
- Page 32 and 33: #include "Poco/DOM/Document.h" #inc
- Page 34: Copyright © 2006-2010 by Applied I
<strong>XML</strong><br />
<strong>Work<strong>in</strong>g</strong> <strong>with</strong> <strong>XML</strong> <strong>in</strong> <strong>POCO</strong>.
Overview<br />
> <strong>XML</strong><br />
> Simple API for <strong>XML</strong> (SAX)<br />
> Document Object Model (DOM)<br />
> Creat<strong>in</strong>g <strong>XML</strong> documents
<strong>XML</strong><br />
> eXtensible Markup Language, free, open standard<br />
> is a general-purpose specification for creat<strong>in</strong>g custom markup<br />
languages<br />
> purpose is to aid <strong>in</strong>formation systems <strong>in</strong> shar<strong>in</strong>g structured data,<br />
expecially over the net<br />
> documents must be well-formed (e.g. a start tag () must<br />
always be closed by an end tag ()<br />
> valid: the document can be checked aga<strong>in</strong>st a schema<br />
(not supported by <strong>POCO</strong>)
<strong>XML</strong> – Vocabulary<br />
> <strong>XML</strong> declaration (optional)<br />
<br />
> root element<br />
the top most element start<strong>in</strong>g a <strong>XML</strong> document<br />
> elements and attributes:<br />
<br />
Element Content<br />
<br />
> Comments<br />
<strong>XML</strong> – Vocabulary (cont)<br />
> Process<strong>in</strong>g Instruction<br />
is actually <strong>in</strong>formation for the application. Not really of <strong>in</strong>terest to<br />
the <strong>XML</strong> parser (exception: <strong>XML</strong> declaration)<br />
<br />
> CDATA<br />
used to escape blocks of text conta<strong>in</strong><strong>in</strong>g characters which would<br />
otherwise be recognized as markup<br />
]]>
<strong>XML</strong> Programm<strong>in</strong>g Interfaces<br />
> <strong>POCO</strong> supports two <strong>in</strong>terfaces for work<strong>in</strong>g <strong>with</strong> (read<strong>in</strong>g and<br />
writ<strong>in</strong>g) <strong>XML</strong> data:<br />
> The Simple API for <strong>XML</strong>, Version 2<br />
> The Document Object Model
The Simple API for <strong>XML</strong> (SAX)<br />
> SAX was orig<strong>in</strong>ally a Java-only API for read<strong>in</strong>g <strong>XML</strong> data.<br />
> The API has been developed by a group of volunteers, not by an<br />
"official" standardization group.<br />
> The current version of the API is 2.0.2 (s<strong>in</strong>ce April 2004)<br />
> <strong>POCO</strong> supports a C++ variant of the orig<strong>in</strong>al Java API.<br />
> For more <strong>in</strong>formation: http://www.saxproject.org
Event-driven Pars<strong>in</strong>g<br />
> SAX is an event-driven <strong>in</strong>terface.<br />
> The <strong>XML</strong> document is not loaded <strong>in</strong>to memory as a whole for<br />
pars<strong>in</strong>g.<br />
> Instead, the parser scans the <strong>XML</strong> document, and for every <strong>XML</strong><br />
construct (element, text, process<strong>in</strong>g <strong>in</strong>struction, etc.) it f<strong>in</strong>ds, calls<br />
a certa<strong>in</strong> member function of a handler object.<br />
> SAX basically def<strong>in</strong>es the <strong>in</strong>terfaces of these handler objects, as<br />
well as the <strong>in</strong>terface you use to start and configure the parser.
startDocument()<br />
startElement()<br />
<br />
<br />
<br />
some text<br />
<br />
<br />
<br />
startElement()<br />
characters()<br />
endElement()<br />
startElement()<br />
endElement()<br />
endElement()<br />
endDocument()
SAX Interfaces<br />
> Attributes<br />
(access attributes values by <strong>in</strong>dex or name)<br />
> ContentHandler<br />
(startElement(), endElement(), characters(), ...)<br />
> DeclHandler<br />
(partly supported for report<strong>in</strong>g entity declarations)<br />
> DTDHandler<br />
(notationDecl(), unparsedEntityDecl())<br />
> LexicalHandler<br />
(startDTD(), endDTD(), startCDATA(), endCDATA(), comment())
#<strong>in</strong>clude "Poco/SAX/ContentHandler.h"<br />
class MyHandler: public Poco::<strong>XML</strong>::ContentHandler<br />
{<br />
public:<br />
MyHandler();<br />
void setDocumentLocator(const Locator* loc);<br />
void startDocument();<br />
void endDocument();<br />
void startElement(<br />
const <strong>XML</strong>Str<strong>in</strong>g& namespaceURI,<br />
const <strong>XML</strong>Str<strong>in</strong>g& localName,<br />
const <strong>XML</strong>Str<strong>in</strong>g& qname,<br />
const Attributes& attributes);<br />
void endElement(<br />
const <strong>XML</strong>Str<strong>in</strong>g& uri,<br />
const <strong>XML</strong>Str<strong>in</strong>g& localName,<br />
const <strong>XML</strong>Str<strong>in</strong>g& qname);
void characters(const <strong>XML</strong>Char ch[], <strong>in</strong>t start, <strong>in</strong>t length);<br />
void ignorableWhitespace(const <strong>XML</strong>Char ch[], <strong>in</strong>t start, <strong>in</strong>t len);<br />
void process<strong>in</strong>gInstruction(<br />
const <strong>XML</strong>Str<strong>in</strong>g& target,<br />
const <strong>XML</strong>Str<strong>in</strong>g& data);<br />
void startPrefixMapp<strong>in</strong>g(<br />
const <strong>XML</strong>Str<strong>in</strong>g& prefix,<br />
const <strong>XML</strong>Str<strong>in</strong>g& uri);<br />
void endPrefixMapp<strong>in</strong>g(const <strong>XML</strong>Str<strong>in</strong>g& prefix);<br />
};<br />
void skippedEntity(const <strong>XML</strong>Str<strong>in</strong>g& name);
Decl Handler<br />
> optional handler for DTD declarations <strong>in</strong> an <strong>XML</strong> file<br />
> Document Type Declaration<br />
> sort of simple schema language<br />
handles DTD declarations <strong>in</strong> an <strong>XML</strong> file<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
Peter Schojer<br />
15/03/1976<br />
Male<br />
<br />
#<strong>in</strong>clude “Poco/SAX/DeclHandler.h”<br />
class MyDeclHandler: public Poco::<strong>XML</strong>::DeclHandler<br />
{<br />
public:<br />
MyDeclHandler();<br />
void attributeDecl(<br />
const <strong>XML</strong>Str<strong>in</strong>g& eName,<br />
const <strong>XML</strong>Str<strong>in</strong>g& aName,<br />
const <strong>XML</strong>Str<strong>in</strong>g* valueDefault,<br />
const <strong>XML</strong>Str<strong>in</strong>g* value);<br />
void elementDecl(const <strong>XML</strong>Str<strong>in</strong>g& name, const <strong>XML</strong>Str<strong>in</strong>g& model);<br />
void externalEntityDecl(<br />
const <strong>XML</strong>Str<strong>in</strong>g& name,<br />
const <strong>XML</strong>Str<strong>in</strong>g* publicId,<br />
const <strong>XML</strong>Str<strong>in</strong>g& systemId);<br />
};<br />
void <strong>in</strong>ternalEntityDecl(<br />
const <strong>XML</strong>Str<strong>in</strong>g& name,<br />
const <strong>XML</strong>Str<strong>in</strong>g& value);
DTDHandler<br />
> handles DTD not handled by DeclHandler<br />
> unparsed entities<br />
> notations<br />
> all reported between startDocument and first startElement
DTD Entity<br />
> variables used to def<strong>in</strong>e shortcuts to text<br />
> either as <strong>in</strong>ternal entity<br />
<br />
&owner;<br />
> or as an external entity<br />
DTD Notation<br />
> allows you to <strong>in</strong>clude data <strong>in</strong> your <strong>XML</strong> file which is not <strong>XML</strong><br />
<br />
<br />
>
#<strong>in</strong>clude “Poco/SAX/DTDHandler.h”<br />
class MyDTDHandler: public Poco::<strong>XML</strong>::DTDHandler<br />
{<br />
public:<br />
MyDTDHandler();<br />
void notationDecl(<br />
const <strong>XML</strong>Str<strong>in</strong>g& name,<br />
const <strong>XML</strong>Str<strong>in</strong>g* publicId,<br />
const <strong>XML</strong>Str<strong>in</strong>g* systemId);<br />
};<br />
void unparsedEntityDecl(<br />
const <strong>XML</strong>Str<strong>in</strong>g& name,<br />
const <strong>XML</strong>Str<strong>in</strong>g* publicId,<br />
const <strong>XML</strong>Str<strong>in</strong>g& systemId,<br />
const <strong>XML</strong>Str<strong>in</strong>g& notationName);
LexicalHandler<br />
> optional extension handler for SAX to provide lexical <strong>in</strong>formation<br />
about an <strong>XML</strong> document<br />
> comments<br />
> CDATA sections<br />
> reports starts and end of DTD sections and entities
#<strong>in</strong>clude “Poco/SAX/LexicalHandler.h”<br />
class MyLexicalHandler: public Poco::<strong>XML</strong>::LexicalHandler {<br />
public:<br />
MyLexicalHandler();<br />
void startDTD(<br />
const <strong>XML</strong>Str<strong>in</strong>g& name,<br />
const <strong>XML</strong>Str<strong>in</strong>g& publicId,<br />
const <strong>XML</strong>Str<strong>in</strong>g& systemId);<br />
void endDTD();<br />
void startEntity(const <strong>XML</strong>Str<strong>in</strong>g& name);<br />
void endEntity(const <strong>XML</strong>Str<strong>in</strong>g& name);<br />
void startCDATA();<br />
void endCDATA();<br />
};<br />
void comment(const <strong>XML</strong>Char ch[], <strong>in</strong>t start, <strong>in</strong>t length);
SAX Parser Configuration<br />
> <strong>XML</strong>Reader def<strong>in</strong>es the <strong>in</strong>terface of the parser.<br />
> Methods for register<strong>in</strong>g handlers<br />
(setContentHandler(), etc.)<br />
> Methods for parser configuration:<br />
> setFeature(), getFeature()<br />
e.g. for enabl<strong>in</strong>g/disabl<strong>in</strong>g namespaces support<br />
> setProperty(), getProperty()<br />
e.g. for register<strong>in</strong>g LexicalHandler, DeclHandler
SAX Namespaces Support<br />
feature<br />
namespaces<br />
feature<br />
namespaceprefixes<br />
Namespace<br />
URI<br />
Local<br />
Name<br />
QName<br />
false false – – ✔<br />
true false ✔ ✔ –<br />
true true ✔ ✔ ✔<br />
false true – – ✔
class MyHandler: public ContentHandler, public LexicalHandler<br />
{<br />
[...]<br />
};<br />
MyHandler handler;<br />
SAXParser parser;<br />
parser.setFeature(<strong>XML</strong>Reader::FEATURE_NAMESPACES, true);<br />
parser.setFeature(<strong>XML</strong>Reader::FEATURE_NAMESPACE_PREFIXES, true);<br />
parser.setContentHandler(&handler);<br />
parser.setProperty(<strong>XML</strong>Reader::PROPERTY_LEXICAL_HANDLER,<br />
static_cast(&handler));<br />
try<br />
{<br />
parser.parse(“test.xml”);<br />
}<br />
catch (Poco::Exception& e)<br />
{<br />
std::cerr
The Document Object Model<br />
> The Document Object Model is an API specified by the World<br />
Wide Web Consortium (W3C)<br />
> DOM uses a tree representation of the <strong>XML</strong> document<br />
> The entire document has to be loaded <strong>in</strong>to memory<br />
> You can modify the <strong>XML</strong> document directly
<br />
<br />
some text<br />
<br />
<br />
<br />
Document<br />
Element<br />
Text<br />
Element<br />
Element<br />
Attr<br />
Attr
EventTarget<br />
Node<br />
Document Element CharacterData Process<strong>in</strong>gInstruction<br />
Text<br />
Comment<br />
CDATASection
Navigat<strong>in</strong>g the DOM<br />
> Node has<br />
> parentNode()<br />
> firstChild(), lastChild()<br />
> nextSibl<strong>in</strong>g(), previousSibl<strong>in</strong>g()<br />
> NodeIterator for document-order traversal:<br />
nextNode(), previousNode()<br />
> TreeWalker for arbitraty navigation:<br />
parentNode(), firstChild(), lastChild(), etc.<br />
> NodeIterator and TreeWalker support node filter<strong>in</strong>g
Memory Management <strong>in</strong> the DOM<br />
> DOM Nodes are reference counted.<br />
> If you create a new node and add it to a document, the<br />
document <strong>in</strong>crements its reference count. So use an AutoPtr.<br />
> You only get ownership of non-tree objects implement<strong>in</strong>g the<br />
NamedNodeMap and NodeList <strong>in</strong>terface.<br />
You have to release them (or use an AutoPtr).<br />
> The document keeps ownership of nodes you remove from the<br />
tree. These nodes end up <strong>in</strong> the document's AutoReleasePool.
#<strong>in</strong>clude "Poco/DOM/DOMParser.h"<br />
#<strong>in</strong>clude "Poco/DOM/Document.h"<br />
#<strong>in</strong>clude "Poco/DOM/NodeIterator.h"<br />
#<strong>in</strong>clude "Poco/DOM/NodeFilter.h"<br />
#<strong>in</strong>clude "Poco/DOM/AutoPtr.h"<br />
#<strong>in</strong>clude "Poco/SAX/InputSource.h"<br />
[...]<br />
std::ifstream <strong>in</strong>(“test.xml”);<br />
Poco::<strong>XML</strong>::InputSource src(<strong>in</strong>);<br />
Poco::<strong>XML</strong>::DOMParser parser;<br />
Poco::AutoPtr pDoc = parser.parse(&src);<br />
Poco::<strong>XML</strong>::NodeIterator it(pDoc, Poco::<strong>XML</strong>::NodeFilter::SHOW_ELEMENTS);<br />
Poco::<strong>XML</strong>::Node* pNode = it.nextNode();<br />
while (pNode)<br />
{<br />
std::cout
Creat<strong>in</strong>g <strong>XML</strong> Documents<br />
> You can create an <strong>XML</strong> document by:<br />
> build<strong>in</strong>g a DOM document from scratch, or<br />
> by us<strong>in</strong>g the <strong>XML</strong>Writer class,<br />
> or by generat<strong>in</strong>g the <strong>XML</strong> yourself.<br />
> <strong>XML</strong>Writer supports a SAX <strong>in</strong>terface for generat<strong>in</strong>g <strong>XML</strong> data.
#<strong>in</strong>clude "Poco/DOM/Document.h"<br />
#<strong>in</strong>clude "Poco/DOM/Element.h"<br />
#<strong>in</strong>clude "Poco/DOM/Text.h"<br />
#<strong>in</strong>clude "Poco/DOM/AutoPtr.h" //typedef to Poco::AutoPtr<br />
#<strong>in</strong>clude "Poco/DOM/DOMWriter.h"<br />
#<strong>in</strong>clude "Poco/<strong>XML</strong>/<strong>XML</strong>Writer.h"<br />
us<strong>in</strong>g namespace Poco::<strong>XML</strong>;<br />
AutoPtr pDoc = new Document;<br />
AutoPtr pRoot = pDoc->createElement("root");<br />
pDoc->appendChild(pRoot);<br />
AutoPtr pChild1 = pDoc->createElement("child1");<br />
AutoPtr pText1 = pDoc->createTextNode("text1");<br />
pChild1->appendChild(pText1);<br />
pRoot->appendChild(pChild1);<br />
AutoPtr pChild2 = pDoc->createElement("child2");<br />
AutoPtr pText2 = pDoc->createTextNode("text2");<br />
pChild2->appendChild(pText2);<br />
pRoot->appendChild(pChild2);<br />
DOMWriter writer;<br />
writer.setNewL<strong>in</strong>e("\n");<br />
writer.setOptions(<strong>XML</strong>Writer::PRETTY_PRINT);<br />
writer.writeNode(std::cout, pDoc);
#<strong>in</strong>clude "Poco/<strong>XML</strong>/<strong>XML</strong>Writer.h"<br />
#<strong>in</strong>clude "Poco/SAX/AttributesImpl.h"<br />
std::ofstream str(“test.xml”)<br />
<strong>XML</strong>Writer writer(str, <strong>XML</strong>Writer::WRITE_<strong>XML</strong>_DECLARATION |<br />
<strong>XML</strong>Writer::PRETTY_PRINT);<br />
writer.setNewL<strong>in</strong>e("\n");<br />
writer.startDocument();<br />
AttributesImpl attrs;<br />
attrs.addAttribute("", "", "a1", "", "v1");<br />
attrs.addAttribute("", "", "a2", "", "v2");<br />
writer.startElement("urn:mynamespace", "root", "", attrs);<br />
writer.startElement("", "", "sub");<br />
writer.endElement("", "", "sub");<br />
writer.endElement("urn:mynamespace", "root", "");<br />
writer.endDocument();
Copyright © 2006-2010 by Applied Informatics Software Eng<strong>in</strong>eer<strong>in</strong>g GmbH.<br />
Some rights reserved.<br />
www.app<strong>in</strong>f.com | <strong>in</strong>fo@app<strong>in</strong>f.com<br />
T +43 4253 32596 | F +43 4253 32096