Working with XML in POCO.

Working with XML in POCO. Working with XML in POCO.

pocoproject.org
from pocoproject.org More from this publisher

<strong>XML</strong><br />

<strong>Work<strong>in</strong>g</strong> <strong>with</strong> <strong>XML</strong> <strong>in</strong> <strong>POCO</strong>.


Overview<br />

> <strong>XML</strong><br />

> Simple API for <strong>XML</strong> (SAX)<br />

> Document Object Model (DOM)<br />

> Creat<strong>in</strong>g <strong>XML</strong> documents


<strong>XML</strong><br />

> eXtensible Markup Language, free, open standard<br />

> is a general-purpose specification for creat<strong>in</strong>g custom markup<br />

languages<br />

> purpose is to aid <strong>in</strong>formation systems <strong>in</strong> shar<strong>in</strong>g structured data,<br />

expecially over the net<br />

> documents must be well-formed (e.g. a start tag () must<br />

always be closed by an end tag ()<br />

> valid: the document can be checked aga<strong>in</strong>st a schema<br />

(not supported by <strong>POCO</strong>)


<strong>XML</strong> – Vocabulary<br />

> <strong>XML</strong> declaration (optional)<br />

<br />

> root element<br />

the top most element start<strong>in</strong>g a <strong>XML</strong> document<br />

> elements and attributes:<br />

<br />

Element Content<br />

<br />

> Comments<br />


<strong>XML</strong> – Vocabulary (cont)<br />

> Process<strong>in</strong>g Instruction<br />

is actually <strong>in</strong>formation for the application. Not really of <strong>in</strong>terest to<br />

the <strong>XML</strong> parser (exception: <strong>XML</strong> declaration)<br />

<br />

> CDATA<br />

used to escape blocks of text conta<strong>in</strong><strong>in</strong>g characters which would<br />

otherwise be recognized as markup<br />

]]>


<strong>XML</strong> Programm<strong>in</strong>g Interfaces<br />

> <strong>POCO</strong> supports two <strong>in</strong>terfaces for work<strong>in</strong>g <strong>with</strong> (read<strong>in</strong>g and<br />

writ<strong>in</strong>g) <strong>XML</strong> data:<br />

> The Simple API for <strong>XML</strong>, Version 2<br />

> The Document Object Model


The Simple API for <strong>XML</strong> (SAX)<br />

> SAX was orig<strong>in</strong>ally a Java-only API for read<strong>in</strong>g <strong>XML</strong> data.<br />

> The API has been developed by a group of volunteers, not by an<br />

"official" standardization group.<br />

> The current version of the API is 2.0.2 (s<strong>in</strong>ce April 2004)<br />

> <strong>POCO</strong> supports a C++ variant of the orig<strong>in</strong>al Java API.<br />

> For more <strong>in</strong>formation: http://www.saxproject.org


Event-driven Pars<strong>in</strong>g<br />

> SAX is an event-driven <strong>in</strong>terface.<br />

> The <strong>XML</strong> document is not loaded <strong>in</strong>to memory as a whole for<br />

pars<strong>in</strong>g.<br />

> Instead, the parser scans the <strong>XML</strong> document, and for every <strong>XML</strong><br />

construct (element, text, process<strong>in</strong>g <strong>in</strong>struction, etc.) it f<strong>in</strong>ds, calls<br />

a certa<strong>in</strong> member function of a handler object.<br />

> SAX basically def<strong>in</strong>es the <strong>in</strong>terfaces of these handler objects, as<br />

well as the <strong>in</strong>terface you use to start and configure the parser.


startDocument()<br />

startElement()<br />

<br />

<br />

<br />

some text<br />

<br />

<br />

<br />

startElement()<br />

characters()<br />

endElement()<br />

startElement()<br />

endElement()<br />

endElement()<br />

endDocument()


SAX Interfaces<br />

> Attributes<br />

(access attributes values by <strong>in</strong>dex or name)<br />

> ContentHandler<br />

(startElement(), endElement(), characters(), ...)<br />

> DeclHandler<br />

(partly supported for report<strong>in</strong>g entity declarations)<br />

> DTDHandler<br />

(notationDecl(), unparsedEntityDecl())<br />

> LexicalHandler<br />

(startDTD(), endDTD(), startCDATA(), endCDATA(), comment())


#<strong>in</strong>clude "Poco/SAX/ContentHandler.h"<br />

class MyHandler: public Poco::<strong>XML</strong>::ContentHandler<br />

{<br />

public:<br />

MyHandler();<br />

void setDocumentLocator(const Locator* loc);<br />

void startDocument();<br />

void endDocument();<br />

void startElement(<br />

const <strong>XML</strong>Str<strong>in</strong>g& namespaceURI,<br />

const <strong>XML</strong>Str<strong>in</strong>g& localName,<br />

const <strong>XML</strong>Str<strong>in</strong>g& qname,<br />

const Attributes& attributes);<br />

void endElement(<br />

const <strong>XML</strong>Str<strong>in</strong>g& uri,<br />

const <strong>XML</strong>Str<strong>in</strong>g& localName,<br />

const <strong>XML</strong>Str<strong>in</strong>g& qname);


void characters(const <strong>XML</strong>Char ch[], <strong>in</strong>t start, <strong>in</strong>t length);<br />

void ignorableWhitespace(const <strong>XML</strong>Char ch[], <strong>in</strong>t start, <strong>in</strong>t len);<br />

void process<strong>in</strong>gInstruction(<br />

const <strong>XML</strong>Str<strong>in</strong>g& target,<br />

const <strong>XML</strong>Str<strong>in</strong>g& data);<br />

void startPrefixMapp<strong>in</strong>g(<br />

const <strong>XML</strong>Str<strong>in</strong>g& prefix,<br />

const <strong>XML</strong>Str<strong>in</strong>g& uri);<br />

void endPrefixMapp<strong>in</strong>g(const <strong>XML</strong>Str<strong>in</strong>g& prefix);<br />

};<br />

void skippedEntity(const <strong>XML</strong>Str<strong>in</strong>g& name);


Decl Handler<br />

> optional handler for DTD declarations <strong>in</strong> an <strong>XML</strong> file<br />

> Document Type Declaration<br />

> sort of simple schema language<br />

handles DTD declarations <strong>in</strong> an <strong>XML</strong> file<br />

<br />

<br />

<br />

<br />


<br />

<br />

<br />

Peter Schojer<br />

15/03/1976<br />

Male<br />

<br />


#<strong>in</strong>clude “Poco/SAX/DeclHandler.h”<br />

class MyDeclHandler: public Poco::<strong>XML</strong>::DeclHandler<br />

{<br />

public:<br />

MyDeclHandler();<br />

void attributeDecl(<br />

const <strong>XML</strong>Str<strong>in</strong>g& eName,<br />

const <strong>XML</strong>Str<strong>in</strong>g& aName,<br />

const <strong>XML</strong>Str<strong>in</strong>g* valueDefault,<br />

const <strong>XML</strong>Str<strong>in</strong>g* value);<br />

void elementDecl(const <strong>XML</strong>Str<strong>in</strong>g& name, const <strong>XML</strong>Str<strong>in</strong>g& model);<br />

void externalEntityDecl(<br />

const <strong>XML</strong>Str<strong>in</strong>g& name,<br />

const <strong>XML</strong>Str<strong>in</strong>g* publicId,<br />

const <strong>XML</strong>Str<strong>in</strong>g& systemId);<br />

};<br />

void <strong>in</strong>ternalEntityDecl(<br />

const <strong>XML</strong>Str<strong>in</strong>g& name,<br />

const <strong>XML</strong>Str<strong>in</strong>g& value);


DTDHandler<br />

> handles DTD not handled by DeclHandler<br />

> unparsed entities<br />

> notations<br />

> all reported between startDocument and first startElement


DTD Entity<br />

> variables used to def<strong>in</strong>e shortcuts to text<br />

> either as <strong>in</strong>ternal entity<br />

<br />

&owner;<br />

> or as an external entity<br />


DTD Notation<br />

> allows you to <strong>in</strong>clude data <strong>in</strong> your <strong>XML</strong> file which is not <strong>XML</strong><br />

<br />

<br />

>


#<strong>in</strong>clude “Poco/SAX/DTDHandler.h”<br />

class MyDTDHandler: public Poco::<strong>XML</strong>::DTDHandler<br />

{<br />

public:<br />

MyDTDHandler();<br />

void notationDecl(<br />

const <strong>XML</strong>Str<strong>in</strong>g& name,<br />

const <strong>XML</strong>Str<strong>in</strong>g* publicId,<br />

const <strong>XML</strong>Str<strong>in</strong>g* systemId);<br />

};<br />

void unparsedEntityDecl(<br />

const <strong>XML</strong>Str<strong>in</strong>g& name,<br />

const <strong>XML</strong>Str<strong>in</strong>g* publicId,<br />

const <strong>XML</strong>Str<strong>in</strong>g& systemId,<br />

const <strong>XML</strong>Str<strong>in</strong>g& notationName);


LexicalHandler<br />

> optional extension handler for SAX to provide lexical <strong>in</strong>formation<br />

about an <strong>XML</strong> document<br />

> comments<br />

> CDATA sections<br />

> reports starts and end of DTD sections and entities


#<strong>in</strong>clude “Poco/SAX/LexicalHandler.h”<br />

class MyLexicalHandler: public Poco::<strong>XML</strong>::LexicalHandler {<br />

public:<br />

MyLexicalHandler();<br />

void startDTD(<br />

const <strong>XML</strong>Str<strong>in</strong>g& name,<br />

const <strong>XML</strong>Str<strong>in</strong>g& publicId,<br />

const <strong>XML</strong>Str<strong>in</strong>g& systemId);<br />

void endDTD();<br />

void startEntity(const <strong>XML</strong>Str<strong>in</strong>g& name);<br />

void endEntity(const <strong>XML</strong>Str<strong>in</strong>g& name);<br />

void startCDATA();<br />

void endCDATA();<br />

};<br />

void comment(const <strong>XML</strong>Char ch[], <strong>in</strong>t start, <strong>in</strong>t length);


SAX Parser Configuration<br />

> <strong>XML</strong>Reader def<strong>in</strong>es the <strong>in</strong>terface of the parser.<br />

> Methods for register<strong>in</strong>g handlers<br />

(setContentHandler(), etc.)<br />

> Methods for parser configuration:<br />

> setFeature(), getFeature()<br />

e.g. for enabl<strong>in</strong>g/disabl<strong>in</strong>g namespaces support<br />

> setProperty(), getProperty()<br />

e.g. for register<strong>in</strong>g LexicalHandler, DeclHandler


SAX Namespaces Support<br />

feature<br />

namespaces<br />

feature<br />

namespaceprefixes<br />

Namespace<br />

URI<br />

Local<br />

Name<br />

QName<br />

false false – – ✔<br />

true false ✔ ✔ –<br />

true true ✔ ✔ ✔<br />

false true – – ✔


class MyHandler: public ContentHandler, public LexicalHandler<br />

{<br />

[...]<br />

};<br />

MyHandler handler;<br />

SAXParser parser;<br />

parser.setFeature(<strong>XML</strong>Reader::FEATURE_NAMESPACES, true);<br />

parser.setFeature(<strong>XML</strong>Reader::FEATURE_NAMESPACE_PREFIXES, true);<br />

parser.setContentHandler(&handler);<br />

parser.setProperty(<strong>XML</strong>Reader::PROPERTY_LEXICAL_HANDLER,<br />

static_cast(&handler));<br />

try<br />

{<br />

parser.parse(“test.xml”);<br />

}<br />

catch (Poco::Exception& e)<br />

{<br />

std::cerr


The Document Object Model<br />

> The Document Object Model is an API specified by the World<br />

Wide Web Consortium (W3C)<br />

> DOM uses a tree representation of the <strong>XML</strong> document<br />

> The entire document has to be loaded <strong>in</strong>to memory<br />

> You can modify the <strong>XML</strong> document directly


<br />

<br />

some text<br />

<br />

<br />

<br />

Document<br />

Element<br />

Text<br />

Element<br />

Element<br />

Attr<br />

Attr


EventTarget<br />

Node<br />

Document Element CharacterData Process<strong>in</strong>gInstruction<br />

Text<br />

Comment<br />

CDATASection


Navigat<strong>in</strong>g the DOM<br />

> Node has<br />

> parentNode()<br />

> firstChild(), lastChild()<br />

> nextSibl<strong>in</strong>g(), previousSibl<strong>in</strong>g()<br />

> NodeIterator for document-order traversal:<br />

nextNode(), previousNode()<br />

> TreeWalker for arbitraty navigation:<br />

parentNode(), firstChild(), lastChild(), etc.<br />

> NodeIterator and TreeWalker support node filter<strong>in</strong>g


Memory Management <strong>in</strong> the DOM<br />

> DOM Nodes are reference counted.<br />

> If you create a new node and add it to a document, the<br />

document <strong>in</strong>crements its reference count. So use an AutoPtr.<br />

> You only get ownership of non-tree objects implement<strong>in</strong>g the<br />

NamedNodeMap and NodeList <strong>in</strong>terface.<br />

You have to release them (or use an AutoPtr).<br />

> The document keeps ownership of nodes you remove from the<br />

tree. These nodes end up <strong>in</strong> the document's AutoReleasePool.


#<strong>in</strong>clude "Poco/DOM/DOMParser.h"<br />

#<strong>in</strong>clude "Poco/DOM/Document.h"<br />

#<strong>in</strong>clude "Poco/DOM/NodeIterator.h"<br />

#<strong>in</strong>clude "Poco/DOM/NodeFilter.h"<br />

#<strong>in</strong>clude "Poco/DOM/AutoPtr.h"<br />

#<strong>in</strong>clude "Poco/SAX/InputSource.h"<br />

[...]<br />

std::ifstream <strong>in</strong>(“test.xml”);<br />

Poco::<strong>XML</strong>::InputSource src(<strong>in</strong>);<br />

Poco::<strong>XML</strong>::DOMParser parser;<br />

Poco::AutoPtr pDoc = parser.parse(&src);<br />

Poco::<strong>XML</strong>::NodeIterator it(pDoc, Poco::<strong>XML</strong>::NodeFilter::SHOW_ELEMENTS);<br />

Poco::<strong>XML</strong>::Node* pNode = it.nextNode();<br />

while (pNode)<br />

{<br />

std::cout


Creat<strong>in</strong>g <strong>XML</strong> Documents<br />

> You can create an <strong>XML</strong> document by:<br />

> build<strong>in</strong>g a DOM document from scratch, or<br />

> by us<strong>in</strong>g the <strong>XML</strong>Writer class,<br />

> or by generat<strong>in</strong>g the <strong>XML</strong> yourself.<br />

> <strong>XML</strong>Writer supports a SAX <strong>in</strong>terface for generat<strong>in</strong>g <strong>XML</strong> data.


#<strong>in</strong>clude "Poco/DOM/Document.h"<br />

#<strong>in</strong>clude "Poco/DOM/Element.h"<br />

#<strong>in</strong>clude "Poco/DOM/Text.h"<br />

#<strong>in</strong>clude "Poco/DOM/AutoPtr.h" //typedef to Poco::AutoPtr<br />

#<strong>in</strong>clude "Poco/DOM/DOMWriter.h"<br />

#<strong>in</strong>clude "Poco/<strong>XML</strong>/<strong>XML</strong>Writer.h"<br />

us<strong>in</strong>g namespace Poco::<strong>XML</strong>;<br />

AutoPtr pDoc = new Document;<br />

AutoPtr pRoot = pDoc->createElement("root");<br />

pDoc->appendChild(pRoot);<br />

AutoPtr pChild1 = pDoc->createElement("child1");<br />

AutoPtr pText1 = pDoc->createTextNode("text1");<br />

pChild1->appendChild(pText1);<br />

pRoot->appendChild(pChild1);<br />

AutoPtr pChild2 = pDoc->createElement("child2");<br />

AutoPtr pText2 = pDoc->createTextNode("text2");<br />

pChild2->appendChild(pText2);<br />

pRoot->appendChild(pChild2);<br />

DOMWriter writer;<br />

writer.setNewL<strong>in</strong>e("\n");<br />

writer.setOptions(<strong>XML</strong>Writer::PRETTY_PRINT);<br />

writer.writeNode(std::cout, pDoc);


#<strong>in</strong>clude "Poco/<strong>XML</strong>/<strong>XML</strong>Writer.h"<br />

#<strong>in</strong>clude "Poco/SAX/AttributesImpl.h"<br />

std::ofstream str(“test.xml”)<br />

<strong>XML</strong>Writer writer(str, <strong>XML</strong>Writer::WRITE_<strong>XML</strong>_DECLARATION |<br />

<strong>XML</strong>Writer::PRETTY_PRINT);<br />

writer.setNewL<strong>in</strong>e("\n");<br />

writer.startDocument();<br />

AttributesImpl attrs;<br />

attrs.addAttribute("", "", "a1", "", "v1");<br />

attrs.addAttribute("", "", "a2", "", "v2");<br />

writer.startElement("urn:mynamespace", "root", "", attrs);<br />

writer.startElement("", "", "sub");<br />

writer.endElement("", "", "sub");<br />

writer.endElement("urn:mynamespace", "root", "");<br />

writer.endDocument();


Copyright © 2006-2010 by Applied Informatics Software Eng<strong>in</strong>eer<strong>in</strong>g GmbH.<br />

Some rights reserved.<br />

www.app<strong>in</strong>f.com | <strong>in</strong>fo@app<strong>in</strong>f.com<br />

T +43 4253 32596 | F +43 4253 32096

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!