Starting with XML - Part 1 of 7

What is XML and why use it?

HTML is the foundation of the WWW and is perfect for presenting a multitude of web pages. Problems arise when large sites need a consistent look and feel with variable content. Creating 30 almost identical HTML files can be straightforward - maintaining each one and making sure that changes in one are reflected in all others is the most laborious and error-prone problem of web site design.

Enter XML. CSS stylesheets began the process of separating the data (HTML) from the presentation (CSS) and the use of external CSS files common to a range of HTML files greatly improved the situation. However, this still left changes to the HTML meaning changes to every single file. XML continues the separation of data from presentation, to the point that one XML stylesheet contains all the common HTML code and a separate CSS stylesheet contains the formatting code. The XML file itself only needs to contain the code that is specific to that page.

The CodeHelp XML site uses XML to reduce the total site size by 50% by removing the need to duplicate the basic HTML structure of the page. This HTML code is held in the XML stylesheet - one file that is used to create all other XML files that are linked to it. One file to update, one file to check. The larger the site, the greater the benefits of XML. Each page is then constructed from the stylesheet with only the customised data loaded from the XML file. Custom written tags provide total control over where and how the data is included. The XML files used in CodeHelp contain only 20 or so tags. All the rest of the code - backgrounds, main index page links, positioning code, mailto links, other common images, all are constructed on the fly from the stylesheet. Processing time is reduced because each file uses the same file from the browser cache instead of downloading another 12kb of repeated data.

A note about XML, standards and browsers

The CodeHelp site uses XSL - eXtensible Stylesheet Language which is a transformational language, not a simple formatting language like CSS. Microsoft Internet Explorer 5 uses formatting with XSL. CodeHelp uses links both to a CSS and an XSL stylesheet, IE picks the XSL version. Other XML capable browsers (like Opera4) only use the CSS stylesheet. XSL is a W3C standard which comes in two parts - a "transformation language" used for preparing documents for display, and a "formatting object set" that is used for actual visual styling. The formatting object set should still be considered a work in progress. However, the transformation language is the main use of XML within the CodeHelp site. It is the ability to transform the XML that provides the benefits of reducing the total size of the site (by reducing duplication of code) and the ability to write new pages in XML (less typing and less errors) and export in accurate, reliable and precise HTML4.

Within the CodeHelp site, the main difference between XML with CSS and XML with XSL is the lack of hyperlinks in the CSS version - the CSS cannot transform the XML data into a <a href></a> tag, it can only format the contents of the href, title and descriptive text which the XML contains. Strangely, there is a way of asking IE5 to create a hyperlink in a CSS/XML combination using the html: namespace. However, this appears not to function in Opera. If anyone finds an XML site which has functioning links when displayed in Opera (CSS/XML or XSL/XML), please let me know at contact me

This is part of Copyright © 1998-2004 Neil Williams
See the file about.html for copying conditions.