XML Explanations
What is this XML stuff? Marking time, marking up.
In this very short presentation I am going to cover the basics of XML.
We can try to describe what XML is but it would be best to go through some examples:
Chris Jennings - PageToScreen
XML is .....
eXtensible Markup Language
It is a markup language
Here is some text marked-up:
- <name>Chris Jennings</name>
Lets back up a bit and look at a well known type of markup:
HTML
Hypertext Markup Language
- was built specifically for web browsers
- in it's early iterations (up to HTML 4.1) it was normal that structure and presentation would both be present within the document
Here is an example of HTML markup:
- <P>Chris <B>Jennings</B></P> <!-- You have seen this type of thing before -->
- As you can see this contains style. My surname is presented as bold.
- HTML was meant to be like this so tags like <i> for italic and <font> were used to define style
XHML - no style please
- XML does not contain information about how content should look, rather it labels the data it contains with useful description.
- XHTML is a newer, more recent version of HTML that applies the concepts of XML to the rules for displaying content in web browsers.
- Let us look at an example of XHTML, and through this we can explore the structure of XML
- It is difficult to show code in the presentation so take a look at this commented PDF and the resulting HTML
- This web page is validated at w3.org
XML is different from XHTML
But .. XHTML must conform to:
- Well-formedness
- syntactical correctness
- XHTML can (and should use semantic markup) but, unlike XML, needs to use certain HTML tags to display in browsers.
- Both need a root but in the case of HTML this needs to be <html></html>
- XML uses tags defined by the author / owner and the rules for these tags will be contained in a DTD
- XML does not contain style information. XHTML uses CSS to define styles
Playing Tag
Using XML in Publishing means:
- Structuring documents and components with meaningful tags
- Structure and semantic tagging are separated from style
- To be sure that the data is well formed we must use a DTD to provide rules for the data and attributes
- A DTD is a Document Type Definition
- A DTD is a kind of SCHEMA for XML documents
- A DTD can be external and shared. It can even be public
- Namespaces are used to identify sources of content
What can we do with XML?
- To display XML data we need to transform it into some mark-up that devices like web browsers understand

- To transform XML we parse the XML through an XSLT document which matches the data and turns it into something else
- XSL means Extensible Style Language and XSLT is how the transforming part of the format is labelled
- Here is a PDF that includes a sample XML with its DTD and an XSL that will transform it into HTML
What else can we do with XML?
- XML can store data
- XML is effectively a database format and so data can be extracted and manipulated
- We can use computer tools that will edit XML.
- We can present information from XML
- We can manipulate the data through the DOM - Document Object Model
- With XPointer we can locate elements within an XML document
Publishing with XML
- As long as we consider a taxonomy for our content we can provide defined blocks for the content
- A taxonomy for a book list may be:
- Books
- Style (or how we want it to look) is completely separate, so we should use meaningful words rather than words that represent style
- So YES to a block called title but NO to a block called big red font
Other Flavours of XML
- DocBook is a SCHEMA that is in use by a lot of publishers. It is includes a public DTD that defines a taxonomy.
- RSS (Real Simple Syndication) is a web feed technology that is specified in XML format.
- XML is used to keep user data and configuration information
- DTDs can be Open Source and enhanced through time
- XHTML versions (held by W3.org) are public DTDs
InDesign - Style and Structure?
- with InDesign you can build a taxonomy for the publication by using the tag window and the structure pane
- style can be mapped to style names to build the structure
- content will be tagged and can be customised
- document can be output as XML and tagged PDF
- InDesign can import XML and use a DTD