... in Java by Richard G Baldwin

Java JAXP, Transforming XML to XHTML

Baldwin shows you how to use XSLT to transform an XML document into an XHTML document. He also shows you how to write Java code to perform the same transformation.

Published: August 24, 2004
By Richard G. Baldwin

Java Programming Notes # 2210

Preface
Preview
Some Details Regarding XHTML
Some Details Regarding XSLT
Discussion and Sample Code

The XSLT Transformation
The Java Code Transformation

Run the Program
Summary
What's Next?
Complete Program Listings

Preface

In the previous lesson entitled Java JAXP, Writing Java Code to Emulate an XSLT Transformation, I showed you how to write a Java program that mimics an XSLT transformation for converting an XML file into a text file. I also showed that once you have a library of Java methods that emulate XSLT elements, it is no more difficult to write a Java program to transform an XML document than it is to write an XSL stylesheet to transform the same document.

In this lesson, I will show you how to use XSLT to transform an XML document into an XHTML document. I will also show you how to write Java code that performs the same transformation.

This lesson is one in a series designed to teach you how to use JAXP and Sun's Java Web Services Developer Pack (JWSDP).

The first lesson in the series was entitled Java API for XML Processing (JAXP), Getting Started. As mentioned above, the previous lesson was entitled Java JAXP, Writing Java Code to Emulate an XSLT Transformation.

JAXP, XML, XSL, XSLT, W3C, and XHTML, a Review

JAXP is an API designed to help you write programs for creating and processing XML documents. It is a critical part of Sun's Java Web Services Developer Pack (JWSDP).

XML is an acronym for the eXtensible Markup Language. I will assume that you already understand XML, and will teach you how to use JAXP to write programs for creating and processing XML documents.

XSL is an acronym for Extensible Stylesheet language. XSLT is an acronym for XSL Transformations.

The numerous uses of XSLT include the following:

Transforming non-XML documents into XML documents.
Transforming XML documents into other XML documents.
Transforming XML documents into non-XML documents.

This lesson explains a Java program that transforms an XML document into an XHTML document.

An XHTML document is an XML document that provides a rigorous alternative to the use of an HTML document. According to the W3C, XHTML 1.0 is a "Reformulation of HTML 4 in XML 1.0."

Viewing tip

You may find it useful to open another copy of this lesson in a separate browser window. That will make it easier for you to scroll back and forth among the different listings and figures while you are reading about them.

Supplementary material

I recommend that you also study the other lessons in my extensive collection of online Java and XML tutorials. You will find those lessons published at Gamelan.com. As of the date of this writing, Gamelan doesn't maintain a consolidated index of my tutorial lessons, and sometimes they are difficult to locate there. You will find a consolidated index at www.DickBaldwin.com.

Preview

A tree structure in memory

A DOM parser can be used to create a tree structure in memory that represents an XML document. In Java, that tree structure is encapsulated in an object of the interface type Document.

Many operations are possible

Given an object of type Document (often called a DOM tree), there are many methods that can be invoked on the object to perform a variety of operations.

Two ways to transform an XML document

There are at least two ways to transform the contents of an XML document into another document:

By writing Java code to manipulate the DOM tree and perform the transformation.
By using XSLT to perform the transformation.

A skeleton library of Java methods

This is one of several lessons that show you how to write the skeleton of a Java library containing methods that emulate the most common XSLT elements. Once you have the library, writing Java code to transform XML documents consists mainly of writing a short driver program to access those methods. Given the proper library of methods, it is no more difficult to write a Java program to perform the transformation than it is to write an XSLT stylesheet.

Library is not my primary purpose

However, my primary purpose in these lessons is not to provide such a library, but rather is to help you understand how to use a DOM tree to create, modify, and manipulate XML documents. By comparing Java code that manipulates a DOM tree with similar XSLT operations, you will have an opportunity to learn a little about XSLT in the process of learning how to manipulate a DOM tree using Java code.

Some Details Regarding XHTML

XHTML documents, a special case

An XHTML document is an XML document. It is a rigorous alternative to an HTML document.

One of the interesting uses of XSLT is the transformation of XML documents into XHTML documents. This makes it possible to render the information contained in an XML document using an XHTML-compatible Web browser.

Where does the transformation take place?

When transforming an XML document for rendering with an XHTML browser, the transformation can take place anywhere between the source of the XML document and the browser.

Transforming on the server

For example, a transformation program can be written in Java and run on a web server as a servlet, or it can be written as a JavaBeans component and accessed from a scriptlet in JavaServer pages (JSP).

Transforming at the browser

The transformation can also be performed by the browser. For example, Microsoft IE 6.0 and XSLT can be used for this purpose.

Will transform XML into XHTML

This and the next several lessons will illustrate parallel Java code and XSLT transformations to transform XML documents into XHTML documents. The sample programs will illustrate various aspects of the manipulation of a DOM tree using Java code.

Requirements for XHTML documents

According to Web Design & Development Using XHTML by Griffin, Morales, and Finnegan, an XHTML document differs from an HTML document in the following ways:

XHTML documents must be well-formed.
Element and attribute names must be in lower case.
Non-empty elements require end tags.
Attribute values must always be quoted.
XHTML documents have no attribute minimization.
XHTML documents end empty elements.
XHTML documents use elements with id and name attributes.
XHTML documents use Document Type Declarations
XHTML documents use XML namespaces.

Although it is not a requirement, an XHTML document often has an XML declaration at the beginning to identify the document as an XML document.

Some Details Regarding XSLT

Previous lessons in this series have provided quite a bit of detailed information regarding the operation of XSLT. Therefore, this discussion will be brief.

Assume that an XML document has been parsed to produce a DOM tree in memory that represents the XML document.

Execute template rules

An XSLT processor starts examining the DOM tree at its root node. It obtains instructions from the XSLT stylesheet telling it how to navigate the tree, and how to treat each node that it encounters along the way.

As each node is encountered, the processor searches the stylesheet looking for a template rule that governs how to treat nodes of that type. If the processor finds a template rule that matches the node type, it performs the operations indicated by the template rule. Otherwise, it executes a built-in template rule appropriate to that node.

Literal text in template rules

If the template rule being applied contains literal text, that literal text is used to create text in the output.

Traversal of the DOM tree

There are at least two XSLT elements that can be used to traverse the children of a context node:

xsl:apply-templates
xsl:for-each

The xsl:apply-templates element

The xsl:apply-templates element was discussed in detail in previous lessons.

The xsl:for-each element

The xsl:for-each element executes an iterative examination of all child nodes of the context node that match a required select attribute. As each child node is examined, it is processed using XSLT elements that form the content of the xsl:for-each element in the template rule.

This lesson will include examples that use the xsl:for-each element in addition to the xsl:apply-templates element. The lesson will also explain a Java method that emulates the xsl:for-each element.

Enough talk, let's see some code

I will begin by discussing the XML file named Dom03.xml (shown in Listing 24 near the end of the lesson) along with the XSL stylesheet file named Dom03.xsl (shown in Listing 25).

A Java program named Dom03

After explaining the transformation produced by applying this stylesheet to this XML document, I will explain the transformation produced by processing the XML file with a Java program named Dom03 (shown in Listing 23) that mimics the behavior of the XSLT transformation.

Discussion and Sample Code

The XML file named Dom03.xml

The XML file shown in Listing 24 is relatively straightforward. A tree view of the XML file is shown in Figure 1. (This XML file is both well-formed and valid.)

#document DOCUMENT_NODE
  A DOCUMENT_TYPE_NODE
  #comment COMMENT_NODE
  xml-stylesheet PROCESSING_INSTRUCTION_NODE
  A ELEMENT_NODE
    Q ELEMENT_NODE
      #text A Big Header
    B ELEMENT_NODE
      C ELEMENT_NODE
        #text Text block 1.
      R ELEMENT_NODE
        #text A Mid Header
      C ELEMENT_NODE
        #text Text block 2.
      #comment COMMENT_NODE
      processor PROCESSING_INSTRUCTION_NODE
      S ELEMENT_NODE
        #text A Small Header
      B ELEMENT_NODE
        C ELEMENT_NODE
          #text Text block 3.
      S ELEMENT_NODE
        #text Another Small Header
      B ELEMENT_NODE
        C ELEMENT_NODE
          #text Text block 4.
        T ELEMENT_NODE
          #text A Smallest Header
        B ELEMENT_NODE
          C ELEMENT_NODE
            #text Text block 5.
          D ELEMENT_NODE
            E ELEMENT_NODE
              #text First list item in E
              G ELEMENT_NODE
                #text Nested G text element
            F ELEMENT_NODE
              #text First list item in F
            E ELEMENT_NODE
              #text Second list item in E
            F ELEMENT_NODE
              #text Second list item in F
            E ELEMENT_NODE
              #text Third list item in E
            F ELEMENT_NODE
              #text Third list item in F
          C ELEMENT_NODE
            #text Text block 6.
        C ELEMENT_NODE
          #text Text block 7.
      R ELEMENT_NODE
        #text Another Mid Header
      C ELEMENT_NODE
        #text Text block 8.
    B ELEMENT_NODE
      R ELEMENT_NODE
        #text Another Mid Header in Another B
      C ELEMENT_NODE
        #text Text block 9.

Figure 1

(This tree view of the XML file was produced using a program named DomTree02, which was discussed in an earlier lesson.

Note that in order to make the tree view more meaningful, I manually removed extraneous line breaks and text nodes associated with those line breaks. The extraneous line breaks in Figure 1 were caused by extraneous line breaks in the XML file. The extraneous line breaks in the XML file were placed there for cosmetic reasons and to force it to fit into this narrow publication format.)

Content of the XML document

The structure and content of the XML document was primarily designed to illustrate various transformation concepts that I intend to explain in this lesson. However, to some extent, I designed the structure and content keeping in mind the ultimate rendering of the XHTML file that will be produced by transforming the XML file into an XHTML file.

The rendered XHTML file

At this point, I'm going to jump ahead and show you what the final XHTML file looks like when rendered using Netscape Navigator v7.1. The rendering of the XHTML file is shown in Figure 2.

(You may find it useful to compare the rendering in Figure 2 with the XML file structure and content in Figure 1. You should be able to identify text nodes in Figure 1 that match up with rendered text in Figure 2.)

Figure 2 Rendered XHTML file

The XSLT Transformation

The XSL stylesheet file named Dom03.xsl

Recall that an XSL stylesheet is itself an XML file, and can therefore be represented as a tree. Figure 3 presents an abbreviated tree view of the stylesheet shown in Listing 25. I colored each of the template rules in this view with alternating colors of red and blue to make them easier to identify.

(As is often the case with XSL stylesheets, this stylesheet file is well-formed but it is not valid.)

NOTE:  IT WAS NECESSARY TO MANUALLY ENTER SOME
LINE BREAKS IN THIS PRESENTATION TO FORCE IT TO
FIT INTO THIS NARROW PUBLICATION FORMAT.

#document DOCUMENT_NODE
  xsl:stylesheet ELEMENT_NODE
      Attribute: version=1.0
      Attribute: xmlns:xsl=http://www.w3.org/1999
                                   /XSL/Transform
    xsl:output ELEMENT_NODE
        Attribute: method=xml
        Attribute: doctype-public=-//W3C//DTD 
                       XHTML 1.0 Transitional//EN
        Attribute: doctype-system=http://www.w3.
        org/TR/xhtml1/DTD/xhtml1-transitional.dtd

    xsl:template ELEMENT_NODE
        Attribute: match=/
      html ELEMENT_NODE
        head ELEMENT_NODE
          meta ELEMENT_NODE
              Attribute: http-equiv=content-type
              Attribute: content=text/html; 
                                    charset=UTF-8
          title ELEMENT_NODE
            #text Generated XHTML file
        body ELEMENT_NODE
           table ELEMENT_NODE
              Attribute: border=2
              Attribute: cellspacing=0
              Attribute: cellpadding=0
              Attribute: width=330
              Attribute: bgcolor=#FFFF00
            tr ELEMENT_NODE
              td ELEMENT_NODE
                xsl:apply-templates ELEMENT_NODE

    xsl:template ELEMENT_NODE
        Attribute: match=B
      xsl:apply-templates ELEMENT_NODE

    xsl:template ELEMENT_NODE
        Attribute: match=C
      p ELEMENT_NODE
        xsl:apply-templates ELEMENT_NODE

    xsl:template ELEMENT_NODE
        Attribute: match=D
      #text List of items in E

      ul ELEMENT_NODE
        xsl:for-each ELEMENT_NODE
            Attribute: select=E
          li ELEMENT_NODE
            xsl:apply-templates ELEMENT_NODE
      #text List of items in F
      ol ELEMENT_NODE
        xsl:for-each ELEMENT_NODE
            Attribute: select=F
          li ELEMENT_NODE
            xsl:apply-templates ELEMENT_NODE

    xsl:template ELEMENT_NODE
        Attribute: match=G
      b ELEMENT_NODE
         xsl:apply-templates ELEMENT_NODE

    xsl:template ELEMENT_NODE
        Attribute: match=Q
      h1 ELEMENT_NODE
        xsl:apply-templates ELEMENT_NODE

    xsl:template ELEMENT_NODE
        Attribute: match=R
      h2 ELEMENT_NODE
        xsl:apply-templates ELEMENT_NODE

    xsl:template ELEMENT_NODE
        Attribute: match=S
      h3 ELEMENT_NODE
        xsl:apply-templates ELEMENT_NODE

    xsl:template ELEMENT_NODE
        Attribute: match=T
      h4 ELEMENT_NODE
        xsl:apply-templates ELEMENT_NODE

Figure 3

Why abbreviated?

The reason that I refer to this as an abbreviated tree view is because I manually deleted comment nodes and extraneous text nodes in order to emphasize the important elements in the stylesheet.

(Extraneous text nodes occur as a result of inserting line breaks in the original XSL document for cosmetic purposes.

Note that I also manually entered several line breaks near the beginning to force the material to fit into this narrow publication format.)

The root element

The root node of all XML documents is the document node. In addition to the root node, there is also a root element, and it is important not to confuse the two.

As you can see from Figure 3, the root element in the XSL document is of type xsl:stylesheet. The root element has two attributes, each of which is standard for XSL stylesheets.

(Note that I manually entered a line break in the second attribute of the xsl:stylesheet node to force it to fit into this narrow publication format. I also manually entered line breaks into two of the attributes of the xsl:output element node to force them to fit into this narrow publication format.)

The first attribute provides the XSLT version. The second attribute points to the XSLT namespace URI, which you can read about in the W3C Recommendation.

Children of the root element node

The root element node (xsl:stylesheet) in Figure 3 has ten child nodes, nine of which are template rules. (The green child node is not a template rule. I will discuss it in detail later.) I colored the template rules in alternating colors of red and blue to make them easier to identify visually.

The template rules

Each of the nine template rules has a match pattern. The nine match patterns in the order that they appear in Figure 3 are as follows:

match=/ (root node)
match=B (matches element node named B)
match=C (matches element node named C)
match=D (matches element node named D)
match=G (matches element node named G)
match=Q (matches element node named Q)
match=R (matches element node named R)
match=S (matches element node named S)
match=T (matches element node named T)

I will discuss each of the nine template rules later, but before doing that I will show you the raw XHTML output produced by this XSLT transformation.

(Note that the Java program discussed later produces essentially the same output as the XSLT transformation.)

The output from the transformation

The result of performing an XSLT transformation (by applying the XSL stylesheet shown in Listing 25 to the XML file shown in Listing 24) is shown in Figure 4. This is the raw XHTML code that was rendered in Figure 2.

I will explain the operations in the XSLT transformation that produced most of the text in Figure 4.

NOTE THAT IT WAS NECESSARY FOR ME TO MANUALLY
INSERT LINE BREAKS IN SEVERAL OF THE LONG LINES
IN THIS MATERIAL TO FORCE IT TO FIT INTO THIS
NARROW PUBLICATION FORMAT.  I ALSO MANUALLY
INSERTED LINE BREAKS AT CRITICAL POINTS TO
MAKE IT EASIER TO INTERPRET THE MATERIAL 
VISUALLY.

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 
     Transitional//EN" "http://www.w3.org/TR/
     xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" 
     xml:lang="en" lang="en">
<head>
<meta http-equiv="content-type" 
     content="text/html; charset=UTF-8"/>
<title>Generated XHTML file</title>
</head>
<body>
<table border="2" cellspacing="0" cellpadding="0"
      width="330" bgcolor="#FFFF00"><tr><td>
<h1>
A Big Header
</h1>
<p>
Text block 1.
</p>
<h2>
A Mid Header
</h2>
<p>
Text block 2.
</p>
<h3>
A Small Header
</h3>
<p>
Text block 3.
</p>
<h3>
Another Small Header
</h3>
<p>
Text block 4.
</p>
<h4>
A Smallest Header
</h4>
<p>
Text block 5.
</p>
List of items in E
<ul>
<li>
First list item in E
<b>
Nested G text element
</b>
</li>
<li>
Second list item in E
</li>
<li>
Third list item in E
</li>
</ul>
List of items in F
<ol>
<li>
First list item in F
</li>
<li>
Second list item in F
</li>
<li>
Third list item in F
</li>
</ol>
<p>
Text block 6.
</p>
<p>
Text block 7.
</p>
<h2>
Another Mid Header
</h2>
<p>
Text block 8.
</p>
<h2>
Another Mid Header in Another B
</h2>
<p>
Text block 9.
</p>
</td></tr></table>
</body></html>

Figure 4

(Note that I manually deleted a couple of extraneous line breaks from the output shown in Figure 4. It was also necessary for me to manually insert line breaks in several of the long lines to force the material to fit in this narrow publication format. I also manually inserted line breaks at certain critical points to make it easier to interpret the material visually.)

Can sometimes get confusing

I will caution you up front that this discussion can become confusing but I will do everything that I can to minimize the confusion. The problem is that the discussion will be mixing tags, attributes and elements from the XML file with tags, attributes, and elements from the stylesheet file and the XHTML file. With so many tags, attributes, and elements being discussed, it is sometimes difficult to keep them separated in your mind.

In particular, in order to cause the output to be a valid XHTML document, it is necessary to manually insert XHTML tags, attributes, and elements in the XSL template rules, which themselves involve XML tags, attributes, and elements.

I will make heavy use of color in an attempt to minimize the confusion.

The first line of text

The first line of text in the output shown in Figure 4 is an XML declaration that is produced automatically by the XSLT transformer available with JAXP. As I mentioned earlier, such a declaration is not required, but is highly recommended by most authors.

The xsl:output element

Before getting into the template rules in Figure 3, I need to explain the xsl:output element shown in green in Figure 3 and reproduced in Figure 5 below for convenient viewing.

    xsl:output ELEMENT_NODE
        Attribute: method=xml
        Attribute: doctype-public=-//W3C//DTD 
                       XHTML 1.0 Transitional//EN
        Attribute: doctype-system=http://www.w3.
        org/TR/xhtml1/DTD/xhtml1-transitional.dtd

Figure 5

The XSL stylesheet version

Listing 1 shows the XSL code that corresponds to the tree view of the stylesheet element shown in Figure 5.

<xsl:output method="xml"
doctype-public="-//W3C//DTD
XHTML 1.0 Transitional//EN"
doctype-system="http://www.w3.
org/TR/xhtml1/DTD/xhtml1-transitional.dtd" />

Listing 1

(As on several previous occasions, I need to remind you that it was necessary for me to manually insert line breaks in Listing 1 to cause the material to fit in this narrow publication format.)

Literal text passes through to the output

As you learned in the previous lesson, any literal text that you include in your XSL stylesheet will be passed through to the output. As you will see later, I will cause the output to contain much of the required XHTML text simply by including that XHTML text as literal text in the stylesheet.

The stylesheet is an XML document

It is important to remember, however, that the XSL stylesheet is itself an XML document, and you cannot include any literal text that would cause a parser to reject it as an XML document. You also cannot do anything that will cause the XSLT processor to reject it as a stylesheet.

XHTML document requires a specific DTD reference

One of the things that is required in the XHTML output is the DTD reference shown in Figure 6.

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 
     Transitional//EN" "http://www.w3.org/TR/
     xhtml1/DTD/xhtml1-transitional.dtd">

Figure 6

(The material in Figure 6 was extracted from Figure 4 and reproduced here for convenient viewing. This is one of three alternative DTDs that can be used with an XHTML document.)

Correct DTD for XHTML but not for stylesheet

The DTD reference in Figure 6 is a correct DTD reference for an XHTML document, but it is not a correct DTD reference for an XSL stylesheet. (In fact, stylesheets don't require a DTD and often don't have one.)

If you simply include the text from Figure 6 as literal text in the stylesheet, (in hopes that it will pass through to the output), the XSLT processor will interpret it as a DTD reference for the stylesheet, and will attempt to validate the stylesheet against that reference. The stylesheet will then be declared invalid and the transformation effort will fail.

Therefore, you must find a way to cause this DTD reference to end up in the XHTML document without confusing the XSLT transformation process.

Two ways to accomplish that

I know of two ways to accomplish that objective. One way is to include the text from Figure 6 in a CDATA section in the stylesheet. This raises some other issues, but it can be made to work.

The easier way is to use the xsl:output element shown in Listing 1 to cause the DTD reference to be written into the output without confusing the parser or the XSLT processor.

The xsl:output element

Here is a partial quotation from XML In A Nutshell, (which I highly recommend), by Elliotte Rusty Harold and W. Scott Means.

"The top-level xsl:output element helps determine the exact formatting of the XML document produced when the result tree is stored in a file, written onto a stream, or otherwise serialized into a sequence of bytes."

Ten optional attributes

To make a long story short, this element has ten optional attributes that are used by the XSLT processor to determine the formatting of the output. The XSLT element shown in Listing 1 specifies values for three of those optional attributes:

method
doctype-public
doctype-system

The default value for method is xml, so I could have omitted this attribute from my stylesheet with no problems. When the value of this attribute is xml, (which is the case in Listing 1), that instructs the processor to produce a well-formed XML document.

The doctype-public attribute sets the public identifier used in the document type declaration.

The doctype-system attribute sets the system identifier used in the document type declaration.

The required XHTML DTD

There are three allowable DTDs that can be used for an XHTML document:

Strict
Transitional
Frameset

I'm not going to get into the differences between these three DTDs in this lesson. Suffice it to say that I elected to use the transitional DTD for this example because it is somewhat easier to use than the other two.

The transitional DTD

Here is what the W3C has to say about the DTD for XHTML 1.0 Transitional:

This DTD module is identified by the following PUBLIC and SYSTEM identifiers:

PUBLIC
"-//W3C//DTD XHTML 1.0 Transitional//EN"
SYSTEM
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"

As you can see, these values match the doctype-public and doctype-system attribute values in Listing 1, and result in the correct output for the XHTML DTD in Figure 6.

The first template rule

The first template rule (extracted from Figure 3 and given a different color scheme) is shown in tree view in Figure 7. This template rule contains an XPath expression that matches the document root (note the forward slash).

    xsl:template ELEMENT_NODE
        Attribute: match=/
      html ELEMENT_NODE
        head ELEMENT_NODE
          meta ELEMENT_NODE
              Attribute: http-equiv=content-type
              Attribute: content=text/html; 
                                    charset=UTF-8
          title ELEMENT_NODE
            #text Generated XHTML file
        body ELEMENT_NODE
           table ELEMENT_NODE
              Attribute: border=2
              Attribute: cellspacing=0
              Attribute: cellpadding=0
              Attribute: width=330
              Attribute: bgcolor=#FFFF00
            tr ELEMENT_NODE
              td ELEMENT_NODE
                xsl:apply-templates ELEMENT_NODE

Figure 7

The template rule in XSL format

Listing 2 shows the same template rule in XSL format, (extracted from Listing 25).

<xsl:template match="/">
<html>
<head>
<meta http-equiv="content-type"
content="text/html; charset=UTF-8"/>
<title>Generated XHTML file</title>
</head>
<body>
<table border="2" cellspacing="0"
cellpadding="0" width="330"
bgcolor="#FFFF00" >
<tr>
<td>
<xsl:apply-templates/>
</td>
</tr>
</table>
</body>
</html>
</xsl:template>

Listing 2

(Note that according to most of the books that I have read, the following namespace attribute should be used on the html tag. However, something about it causes problems with the JAXP transformer so I left it off. The resulting XHTML file is still valid according to the W3C Markup Validation Service even without the namespace attribute.

xmlns="http://www.w3.org/1999/xhtml"
xml:lang="en" lang="en")

The literal text is shown in red

From my viewpoint as the author of the stylesheet, everything that is colored red in Listing 2 is simply literal text that I want to pass through to the output so that it will become part of the raw XHTML text.

The template rule must be well-formed

However, as you can see from Figure 7, the XML parser considers all of this material to be well-formed (but not valid) XML element nodes, attribute nodes, and text nodes. Were I to make a change to any of the red literal text that would corrupt the well-formed nature of the XML code in Listing 2, the stylesheet could not be used to control an XSLT transformation. While a stylesheet is not required to be valid, it is required to be well-formed.

Must be very careful when including markup in stylesheet

Therefore, you must be very careful when you include literal markup text in the stylesheet for whatever purpose. Any markup that you include in the stylesheet must result in the stylesheet being well-formed.

(This was not a problem with the inclusion of literal text in the stylesheet in the previous lesson, because the literal text didn't contain markup characters. As a result, the literal text was interpreted simply as text nodes in the stylesheet. As you can see from Figure 7, however, the literal markup text that was included in this stylesheet was interpreted by the parser as element nodes, attributes and text nodes.)

A very simple template rule.

At first blush, this template rule appears to be very long and very complex. However, as you can see from Listing 2, once you isolate out all of the literal XHTML text that's included in the template rule, the actual XSLT template rule is very simple. This rule simply passes a lot of literal markup text through to the output and causes templates to be applied to all children of the root (document) node. (You learned what it means to apply templates in the previous lesson.)

The XHTML tags

If you are familiar with XHTML syntax, you will recognize that the literal text shown in red in Listing 2 begins with typical XHTML tags such as <html>, <head>, and <body>. These tags are required for an XHTML document. This text is sent to the output before any processing of the DOM tree is performed.

Then the literal text creates an XHTML table with a yellow background. The start tags for the table are sent to the output before the xsl:apply-templates element is executed.

All of the output produced by executing the xsl:apply-templates element is inserted into a single data <td> cell in the table.

Finally, when the xsl:apply-templates element returns, the end tags for the table and the end tags for the document are sent to the output.

The raw XHTML output

Figure 8 shows a condensed version of the raw XHTML output. The XHTML output shown in red in Figure 8 matches the literal text shown in red in the template rule of Listing 2.

NOTE THAT IT WAS NECESSARY FOR ME TO MANUALLY
INSERT LINE BREAKS IN SEVERAL OF THE LONG LINES
IN THIS MATERIAL TO FORCE IT TO FIT INTO THIS
NARROW PUBLICATION FORMAT.

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 
     Transitional//EN" "http://www.w3.org/TR/
     xhtml1/DTD/xhtml1-transitional.dtd">
<html>
<head>
<meta http-equiv="content-type" 
     content="text/html; charset=UTF-8"/>
<title>Generated XHTML file</title>
</head>
<body>
<table border="2" cellspacing="0" cellpadding="0"
      width="330" bgcolor="#FFFF00"><tr><td>

...HTML CODE DELETED FOR BREVITY...

</td></tr></table>
</body></html>

Figure 8

The effect of xsl:apply-templates

Referring once again to Listing 2, we see that this template rule causes templates to be applied to all child nodes of the root or document node. A root node can have only one child node, which is the root element node. Referring back to Figure 1, we see that the root element node is named A.

Now referring back to the tree view of the stylesheet in Figure 3 (and also the list of match patterns presented earlier), we see that the stylesheet doesn't contain a template rule that matches an element named A.

Important to understand built-in behavior

If the processor encounters a node for which there is no matching template rule, it executes a built-in template rule for that type of node. This is where it becomes important to understand the behavior of the built-in template rules, which I explained in the earlier lesson entitled Java JAXP, Implementing Default XSLT Behavior in Java.

The behavior of the built-in template rule for element nodes is to apply templates to all child nodes of the element node. Therefore, in this case, the processor will apply templates to all child nodes of the root element node named A.

Referring back to Figure 1, we see that the root element node has three child nodes, which occur in the following order: Q, B, and B. Therefore, the first node that will be processed is the node named Q.

A template rule that matches Q

Figure 9 and Listing 3 show a template rule that matches an element named Q.

    xsl:template ELEMENT_NODE
        Attribute: match=Q
      h1 ELEMENT_NODE
        xsl:apply-templates ELEMENT_NODE

Figure 9

The tree view of the template rule is shown in Figure 9. The XSL stylesheet code is shown in Listing 3.

<xsl:template match="Q">
<h1>
<xsl:apply-templates />
</h1>
</xsl:template>

Listing 3

A level 1 header in the output

This template rule sends the start and end tags for a level 1 XHTML header to the output, and inserts something between those tags by applying templates to all child nodes of the element node named Q.

Referring back to the element node named Q in Figure 1, we see that it has only one child node, and that node is a text node. Executing the xsl:apply-templates element on a text node causes the built in version of the template rule to be applied. The built-in version gets the value of the text node and sends it to the output. This produces the raw XHTML output shown in Figure 10.

<h1>
A Big Header
</h1>

Figure 10

You should be able to easily identify the header from Figure 10 in the first line of the rendered output in Figure 2.

A template rule that matches B

That takes care of processing the root element node's child named Q. The next child to be processed is a child node named B.

A template rule that matches an element node named B is shown in Figure 11 and Listing 4.

    xsl:template ELEMENT_NODE
        Attribute: match=B
      xsl:apply-templates ELEMENT_NODE

Figure 11

As before, the tree view is shown in Figure 11 and the stylesheet code is shown in Listing 4.

<xsl:template match="B">
<xsl:apply-templates />
</xsl:template>

Listing 4

This template rule is very simple. It simply causes templates to be applied to all child nodes of the element node named B. Referring back to Figure 1, we see that the first child node named B has several child nodes, which occur in the following order: C, R, C, S, B, S, B, R, C.

An abbreviated DOM tree

Don't worry, I'm not going to discuss them all. In fact, I'm going to ignore many of those nodes and their descendants, and concentrate on the abbreviated portion of the DOM tree shown in Figure 12. I am going to concentrate on this portion because it uses XSLT templates not previously discussed in this lesson or in my earlier lessons.

    B ELEMENT_NODE
      ...
      B ELEMENT_NODE
        ...
        B ELEMENT_NODE
          ...
          D ELEMENT_NODE
            E ELEMENT_NODE
              #text First list item in E
              G ELEMENT_NODE
                #text Nested G text element
            F ELEMENT_NODE
              #text First list item in F
            E ELEMENT_NODE
              #text Second list item in E
            F ELEMENT_NODE
              #text Second list item in F
            E ELEMENT_NODE
              #text Third list item in E
            F ELEMENT_NODE
              #text Third list item in F

Figure 12

To help you keep your bearings, the first node named B in Figure 12 is the first node named B belonging to the root element node named A in Figure 1. That node named B will be the starting point for the following discussion. Nodes have been manually removed from Figure 12 at each point where you see an ellipses (...). I will ignore those nodes.

Traversing down the DOM tree

As you saw in the template rule that matches B in Figure 4, each time the processor encounters an element node named B, templates are applied to all child nodes of that node and no other action is required. Therefore, we can immediately skip down to a discussion of the element node named D.

A template rule that matches D

Figure 13 shows a tree view of the template rule that matches D.

    xsl:template ELEMENT_NODE
        Attribute: match=D
      #text List of items in E

      ul ELEMENT_NODE
        xsl:for-each ELEMENT_NODE
            Attribute: select=E
          li ELEMENT_NODE
            xsl:apply-templates ELEMENT_NODE
      #text List of items in F
      ol ELEMENT_NODE
        xsl:for-each ELEMENT_NODE
            Attribute: select=F
          li ELEMENT_NODE
            xsl:apply-templates ELEMENT_NODE

Figure 13

The stylesheet code for the template rule that matches D is shown in Listing 5.

<xsl:template match="D">List of items in E
<ul>

<xsl:for-each select="E">
<li>
<xsl:apply-templates />
</li>
</xsl:for-each>

</ul>List of items in F
<ol>

<xsl:for-each select="F">
<li>
<xsl:apply-templates />
</li>
</xsl:for-each>

</ol>

</xsl:template>

Listing 5

In an attempt to separate the text and markup that controls the transformation process from the text and markup destined to become part of the XHTML document, I colored the latter red in Figure 13 and Listing 5. I also colored the XML comments blue in Listing 5 to make them easy to ignore.

A simpler version

In an attempt to make it even easier to understand the behavior of this template rule, I have reproduced it in Listing 6 with all literal text and all comments removed. I also added indentation to help with the visual aspect of the XSL code.

NOTE: LITERAL TEXT AND COMMENTS WERE MANUALLY
REMOVED FROM THIS TEMPLATE RULE FOR DISCUSSION
PURPOSES.

<xsl:template match="D">
<xsl:for-each select="E">
<xsl:apply-templates />
</xsl:for-each>

<xsl:for-each select="F">
<xsl:apply-templates />
</xsl:for-each>
</xsl:template>

Listing 6

First consider the behavior of the top half of the template rule in Listing 6. This rule is invoked whenever the processor encounters an element node named D.

<xsl:for-each select="E">

The processor identifies all child nodes of D whose name is E and processes them in the order in which they occur.

(It is also possible to process the child nodes in sorted order using a more complex implementation, but that isn't being done here. That will be the topic for a future lesson.)

<xsl:apply-templates>

The processing that is applied to each child node named E depends on the elements that follow the xsl:for-each element in the template rule. In this case, the processor is instructed to apply templates to all child nodes of each node named E.

Referring back to Figure 12, you will see that the node named D has three child nodes named E and three child nodes named F.

(I colored the child nodes named E and F, and their descendants, in alternating colors of red and blue to make them easier to identify visually.)

One of the child nodes named E has a child node named G.

No matching template rules for E or F

Referring back to the tree view of the stylesheet in Figure 3, you can see that there are no matching template rules for nodes named E or F. However, there is a matching template rule for nodes named G.

Apply built-in template rule to node E

When the processor encounters the first node named E, it will apply the built-in template rule for element nodes. That will cause it to apply templates to all child nodes of the node named E. The first child node that it will encounter will be a text node containing the following text:

First list item in E

This text will be sent to the output.

Then it will encounter the node named G and apply the matching template rule to that node. The tree view of that template rule is shown in Figure 14.

    xsl:template ELEMENT_NODE
        Attribute: match=G
      b ELEMENT_NODE
         xsl:apply-templates ELEMENT_NODE

Figure 14

The stylesheet code for the template rule that matches G is shown in Listing 7.

<xsl:template match="G">
<b>
<xsl:apply-templates />
</b>
</xsl:template>

Listing 7

This template rule applies templates to all child nodes of G, and surrounds the output produced by that operation with the XHTML start and end tags to cause that material to be displayed as bold.

Referring back to Figure 12, we see that the node named G has only one child node. It is a text node containing the following text:

Nested G text element

That text will be sent to the output next, surrounded by XHTML bold tags, <b>...</b>.

That completes the processing of the first child of D named E.

Note in Figure 12 that the next child node of D is a node named F. However, we are discussing the behavior of that portion of the template rule shown in Figure 6 that is using the xsl:for-each element to iterate on nodes named E. Therefore, the processor will skip over the node named F and process the next node named E.

This is a simple node that has only one child node and it is a text node containing the following text:

Second list item in E

This text will be the next thing to be sent to the output.

The node named D has one more child node named E, and it has a single child node, which is a text node. The text node contains the following text:

Third list item in E

When that text is sent to the output, the execution of the top half of the template rule shown in Figure 6 will be complete. Then the processor will execute the bottom half of the template rule in Figure 6. The bottom half is identical to the top half except that it iterates on child nodes named F, so I won't discuss it in detail.

Let's look at the XHTML output

Before moving along, let's take a look at the raw XHTML produced by the template rule shown in Listing 5. That XHTML output is shown in Listing 15.

List of items in E
<ul>
<li>
First list item in E
<b>
Nested G text element
</b>
</li>
<li>
Second list item in E
</li>
<li>
Third list item in E
</li>
</ul>
List of items in F
<ol>
<li>
First list item in F
</li>
<li>
Second list item in F
</li>
<li>
Third list item in F
</li>
</ol>

Figure 15

Black text originates in XML document

The black text in Listing 15 originated in the XML file shown in Figure 12. You should be able to match the seven lines of black text in Figure 15 to the corresponding text in Figure 12.

Red and blue text originates in stylesheet

The red text in Listing 15 originated in the stylesheet template rule shown in Listing 5. This literal text is also shown in red in Listing 5.

The blue text in Listing 15 originated in the template rule shown in Listing 7. This text is also shown in blue in Listing 7.

How does it render?

If you go back and examine Figure 2, which shows the XHTML as rendered by the Netscape Navigator browser, you should be able to identify the output in Figure 2 produced by the raw XHTML text in Figure 15. (It occurs between the lines that read Text block 5 and Text block 6.)

As you can see, the template rule shown in Figure 5 used an xsl:for-each element

To iterate on child nodes named E,
To extract the text values of those nodes and their descendants, and
To embed those values in XHTML elements to cause the values to be rendered as an unordered list.

The value of a child node of one of the E nodes was also caused to be rendered in bold.

Then the template rule used an xsl:for-each element

To iterate on child nodes named F,
To extract text values from those nodes, and
To embed those values in XHTML elements to cause the values to be rendered an ordered list.

New XSLT material has been covered

I could go on for hours discussing the interaction of this stylesheet with the XML file in the transformation process. However, a review of the tree view of the stylesheet in Figure 3 reveals that the behavior of the remaining template rules has either been covered in this lesson or in a previous lesson. Therefore, I will terminate this discussion of the XSLT transformation at this point and discuss a Java program that mimics the behavior of this XSLT transformation.

The Java Code Transformation

At this point, I will change direction and concentrate on Java code instead of XSLT elements. The following paragraphs describe a Java program named Dom03, which emulates the XSLT transformation described above. This program transforms an XML file into an XHTML file using a combination of recursive and iterative processing. Along the way, it creates and populates an XHTML table.

This program defines a new method named forEach that mimics the behavior of the xsl:for-each element described above. In addition, this program adds code to the processDocumentNode and processNode methods to emulate the template rules in the XSL file named Dom03.xsl.

Also, as was the case in the previous lessons, this program implements six built-in template rules for an XML processor.

Instructions for creating a custom template rule

To create a custom template rule for this program:

Go to the processNode method.
Identify the node type.
Change the conditional clause in the if statement to implement the required match.
Write code in the body of the if statement to implement the custom rule.

If the modified conditional clause evaluates to true, the custom rule will be executed. If the modified conditional clause evaluates to false, the default rule will be executed. You will see examples of several custom template rules in this program.

Behavior of the program

This program compares the transformation of a specified XML file into a result file, using two different approaches:

An XSLT style sheet and transformation, as discussed above.
Program code that emulates the behavior of the XSLT transformation.

In particular, this program illustrates Java code that emulates the XSLT templates in the file named Dom03.xsl.

Both output files are valid

The program produces two output files, one from the XSLT transformation, and one from executing the Java code. Both files validate as XHTML transitional at the W3C validation service,
http://validator.w3.org/file-upload.html.

Both also validate as HTML files at
http://www.htmlhelp.com/tools/validator/upload.html.

Finally, both files validate using the program named DomTree02, which means that they validate as XML under JAXP.

Usage instructions

The program requires three command line arguments in the following order:

The name of the input XML file - must be Dom03.xml.
The name of the output file to be produced by the XSLT transformation.
The name of the output file to be produced by the program code that emulates the XSLT transformation.

The name of the XSL stylesheet file is extracted from the processing instruction in the XML file, but you could easily modify the program to obtain the name of that file from a command-line argument.

Order of execution

The program begins by executing code to transform the incoming XML file in a way that mimics the XSLT Transformation. Along the way, it saves the processing instructions containing the ID of the stylesheet file for use by the XSLT transformation process later. Otherwise, the code that performs the XSLT transformation would have to search the DOM tree for the XSL stylesheet file.

Then the program uses the XSLT style sheet to transform the XML file into a result file by performing an XSLT transformation under program control.

Errors, exceptions, and testing

No effort was made to provide meaningful information about errors and exceptions.

The program was tested using SDK 1.4.2 under WinXP.

Will discuss in fragments

I will discuss this program in fragments. A complete listing of the program is shown in Listing 23 near the end of the lesson.

Much of the code in this program is very similar to, or identical to code that I discussed in previous lessons. I will discuss that repetitious code only briefly, if at all.

The main method

Listing 8 shows an abbreviated version of the beginning of the class named Dom03 and the ending of the main method.

public class Dom03{
//Code deleted for brevity

//In main method
//Process the DOM tree
thisObj.processDocumentNode(document);

//Perform XSLT transformation
thisObj.doXslTransform(
document,argv[1],procInstr);

//Exception handling code deleted for brevity
}// end main()

Listing 8

The code in this portion of the program is identical to code that I discussed in detail in previous lessons, so I won't discuss it further. I included it here solely to establish the context for discussion of code that is to follow.

Behavior of this code

Briefly, the code in the main method does the following:

Performs all the steps necessary to parse the input XML file, producing an object of type Document whose reference is saved in a reference variable named document.
Instantiates an object of the Dom03 class and saves its reference in a reference variable named thisObj.
Invokes the method named processDocumentNode on thisObj to transform the DOM tree to an output file using program code to perform the transformation.
Invokes the method named doXslTransform on thisObj to perform an XSLT transformation using an XSL stylesheet.

The methods named processDocumentNode and doXslTransform are methods of my own design.

The processDocumentNode method

The beginning of the processDocumentNode method is shown in Listing 9. This version of the method is much longer than versions discussed in previous lessons.

void processDocumentNode(Node node){
//Create the beginning of the XHTML document
out.println("<?xml version=\"1.0\" "
+ "encoding=\"UTF-8\"?>");
out.println(
"<!DOCTYPE html PUBLIC \"-//W3C//DTD "
+ "XHTML 1.0 Transitional//EN\" "
+ "\"http://www.w3.org/TR/xhtml1/"
+ "DTD/xhtml1-transitional.dtd\">");

Listing 9

However, even though this version is much longer, there is nothing in the method that should be a stretch for capable Java programmers. All of the new code in this method is in the form of print statements to cause appropriate XHTML text to appear in the output.

Produces all required output text

This method is used to produce any text required in the output at the document level, such as the XML declaration for an XML document, or the DTD reference for an XHTML document. As you can see from Listing 9, the code in this method does both.

The code in Listing 9 writes an XML declaration, and then writes XHTML text into the output that matches text produced by the green xsl:output element in Figure 3. I have already discussed the need for the XHTML DTD in the XHTML file, so I won't discuss it further here.

The start tag for the html root element

The code in Listing 10 writes the start tag for the html root element of the XHTML document. Then it writes the XML namespace attribute in the output.

(The stylesheet shown in Figure 3 doesn't write an XML namespace attribute for reasons that I explained earlier.)

out.println("<html xmlns=\"http://www.w3."
+ "org/1999/xhtml\" xml:lang=\"en\""
+ " lang=\"en\">");
out.println("<head>");
out.println(
"<meta http-equiv=\"content-type\" "
+ "content=\"text/html; charset="
+ "UTF-8\"/>");
out.println("<title>Generated XHTML file"
+ "</title>");
out.println("</head>");
out.println("<body>");
//Output similar to the above applies to
// most XHTML documents.

//Now set up an XHTML table. This is
// peculiar to this particular example.
out.println("<table border=\"2\" " +
"cellspacing=\"0\" " +
"cellpadding=\"0\" " +
"width=\"330\" " +
"bgcolor=\"#FFFF00\">" +
"<tr><td>");

Listing 10

Following this, the code in Listing 10 writes the same XHTML text in the output that is written by the first red template rule in Figure 3.

Invoke the processNode method

Then the code in Listing 11 invokes the processNode method to trigger a recursive process that processes the entire DOM tree.

processNode(node);

//Finish the XHTML table. This output is
// peculiar to this particular example.
out.println("</td></tr></table>");

//Now finish the output document and flush
// the output buffer. This would apply to
// most XHTML documents.
out.println("</body></html>");
out.flush();
}//end processDocumentNode

Listing 11

When the processNode method returns, the code in Listing 11 writes XHTML text into the output consisting of end tags for the table, the body, and the document. That completes the production of the XHTML document, so the code in Listing 11 flushes the output buffer to assure that everything is written into the file.

Invoke the doXslTransform method

Then the processDocumentNode method terminates and returns control to the main method in Listing 8. At that point, the doXslTransform method is invoked to perform an XSLT transformation on the XML file using the stylesheet discussed earlier in this lesson.

Quite a lot of code was added to the processDocumentNode method, but as mentioned earlier, all of that code was added simply to write XHTML text into the output at the document level. All of the changes to the program that were significant from a programming viewpoint were either included in the processNode method, or were part of a new method named forEach.

Invoke the processNode method

Despite the name that I chose to give to the processDocumentNode method, it doesn't actually process the document node directly. Rather after sending any required text to the output, it invokes the method named processNode (see Listing 11) to actually process the document node.

(Note that the Document object's reference is passed to the method named processNode in Listing 11.)

The processNode method

As you have learned in previous lessons, there are seven possible types of nodes in an XML document:

root or document node
element node
attribute node
text node
comment node
processing instruction node
namespace node

The processNode method handles the first six types and ignores namespace nodes.

(Apparently it is not possible to handle namespace nodes in a Java program because there is no constant in the Node class that can be used to identify namespace nodes. This will become clear as we examine the code in the processNode method.)

Get and save the node type

The processNode method in this program contains quite a few changes relative to the programs that I discussed in previous lessons. Therefore, I will discuss the processNode method in detail.

Code that you write in this method (and in the processDocumentNode method discussed above) is somewhat analogous to writing an XSL stylesheet to be used in an XSLT transformation.

Test for a valid node, and get its type

The beginning of the processNode method is shown in Listing 12. The method receives an incoming parameter of type Node, which can represent any of the seven types of nodes in the above list.

As you can see in Listing 12, if the parameter doesn't point to an actual object, the method quietly returns, as opposed to throwing a NullPointerException.

void processNode(Node node){

try{
if (node == null){
System.err.println(
"Nothing to do, node is null");
return;
}//end if

//Get the actual type of the node
int type = node.getNodeType();

Listing 12

The final statement in Listing 12 invokes the getNodeType method to get and save the type of the node whose reference was received as an incoming parameter.

Process the node

Each time the processNode method is invoked, it receives a Node object's reference as an incoming parameter. The code in Listing 12 determines the type of the incoming node. Listing 13 shows the beginning of a switch statement that is used to initiate the processing of each incoming node based on its type.

switch (type){
case Node.DOCUMENT_NODE:{
if(false){
//unreachable in this program
}else{//invoke default behavior
defElOrRtNodeTemp(node);
}//end else
break;
}//end case DOCUMENT_NODE

Listing 13

The switch statement has six cases to handle six types of nodes, plus a default case to ignore namespace nodes.

The DOCUMENT_NODE case

The code in Listing 13 will be executed whenever the incoming method parameter points to a document node.

(Note that this will happen only once during the processing of a DOM tree. The first node processed will always be the document node, and there is only one document node in a DOM tree.)

This code is identical to code that I have discussed in previous lessons, so I won't discuss it further. I included it here solely to help you get oriented as to the overall control structure of the processNode method.

I do want to point out, however, that when the processNode method is invoked on a document node, the code in Listing 13 causes a method named defElOrRtNodeTemp to be invoked. This method emulates the behavior of a built-in template rule, which in this case causes templates to be applied to all child nodes of the document node.

Creating custom template rules

Although this lesson does not create a custom template rule for document nodes, the process for creating a custom template rule is as follows:

Go to this method named processNode.
Identify the case for the node type in the switch statement.
Change the conditional clause in the if statement for that case to implement a match for a particular node of that type.
Write code in the body of the if statement to implement the custom template rule.

If the modified conditional clause evaluates to true, the custom template rule will be executed. If it evaluates to false, the default rule will be executed.

The ELEMENT_NODE case

Most of the changes to this program (as compared to programs discussed in previous lessons) consist of changes to the code that processes element nodes in the switch statement. The code for element node case is rather long, so I will discuss it in fragments.

(A new method named forEach was also added to the program. I will discuss that method in detail later.)

A match for element nodes named B

The beginning of the case for element nodes is shown in Listing 14.

case Node.ELEMENT_NODE:{

if(node.getNodeName() == "B"){
applyTemplates(node,null);
}//end if

Listing 14

Note the similarity of the code in Listing 14 and the XSLT template rule shown in Listing 4. When the node being is processed is an element node whose name is B, the code in Listing 14 invokes the applyTemplates method to cause templates to be applied to all child nodes of the node named B.

I discussed the applyTemplates method in earlier lessons, and won't repeat that discussion here.

A match for element nodes named C

Listing 15 shows code that matches element nodes named C.

else if(node.getNodeName() == "C"){
out.println("<p>");
applyTemplates(node,null);
out.println("</p>");
}//end if

Listing 15

This code applies templates to all child nodes of the node named C, and wraps the output produced by that operation in an XHTML paragraph element, <p>...</p>.

Compare the code in Listing 15 with the second red XSLT template rule in Figure 3.

A match for element nodes named D

Listing 16 shows code that matches element nodes named D.

else if(node.getNodeName() == "D"){
out.println("List of items in E");
out.println("<ul>");
forEach(node,"E");
out.println("</ul>");

out.println("List of items in F");
out.println("<ol>");
forEach(node,"F");
out.println("</ol>");
}//end if

Listing 16

I'll start my discussion of the code in Listing 16 by comparing it with the template rule shown in Listing 5. The behavior of this code is the same as the behavior of the template rule in Listing 5. However, the execution structure is slightly different.

The code in Listing 16 begins by sending some text followed by the start tag for an unordered list to the output. Then it invokes the forEach method, passing the context node and the name of the child node named E as parameters.

The forEach method

The entire forEach method is shown in Listing 17. This method, in conjunction with the processNode method, emulates the behavior of an xsl:for-each XSLT element.

private void forEach(Node node,String select){
NodeList children = node.getChildNodes();
if (children != null){
int len = children.getLength();
//Iterate on NodeList of child nodes,
// processing nodes that match the select.

for (int i = 0; i < len; i++){
if(children.item(i).getNodeName().
equals(select)){
//Make a recursive call from within
// this iterative template rule.
processNode(children.item(i));
}//end if
}//end for loop
}//end if
}//end forEach

Listing 17

If you have been studying the previous lessons in this series, the structure of the method should be familiar to you.

The structure of the forEach method

The method receives two parameters:

A reference to a particular node of type Node.
The name of a node that should be a child node of the node.

The purpose of the method is to access each child node that matches the name, in the order in which they appear in the DOM tree, and to apply a particular operation to each of those nodes.

(A future lesson will show you how to access the nodes in sorted order.)

Get and iterate on a list of child nodes

The code in Listing 17 starts by getting a list of all the child nodes of the node referenced by the first incoming parameter.

Then it iterates on the list, identifying those nodes whose names match the second incoming parameter. When it finds a match, it makes a recursive call to the processNode method where the operation to be applied to that node is defined.

When it has processed all the nodes in the list, it returns void to the code shown in Listing 16.

Process all child nodes named E

The first time this method is called in Listing 16, it is instructed to identify and perform an operation on all the child nodes named E. When the forEach method calls the processNode method, passing an E node's reference as a parameter, the code shown in Listing 18 is executed. (Note that this is part of the element node case in the switch statement belonging to the processNode method.)

else if(node.getNodeName() == "E"){
out.println("<li>");
applyTemplates(node,null);
out.println("</li>");
}//end if

Listing 18

Note that I could have put the code in Listing 18 inside the forEach method. However, I elected to do it the way that I did to make the forEach method more general, and confine all the code for custom template rules to the processDocumentNode and processNode methods.

As you can see, the code in Listing 18 causes templates to be applied to all child nodes of the node named E, and causes the output produced by that operation to be surrounded by the start and end tags for an XHTML list item, <li>...<li>.

Finished with nodes named E

That completes the operation necessary to emulate the template rule in Listing 5 for nodes named E, and completes the top half of the code being executed in Listing 16.

Process all child nodes named F

The bottom half of the code in Listing 16 does essentially the same thing, except that it iterates on child nodes named F and wraps the results in the XHTML tags for an ordered list, <ol>..</ol>.

In this case, the forEach method will isolate nodes named F and pass them recursively to the processNode method.

At that point, the code in Listing 19 will be executed with exactly the same behavior as the code in Listing 18, except that it is applied to nodes named F instead of nodes named E.

else if(node.getNodeName() == "F"){
out.println("<li>");
applyTemplates(node,null);
out.println("</li>");
}//end if

Listing 19

A match for element nodes named G

Listing 20 shows custom code that applies to nodes named G.

else if(node.getNodeName() == "G"){
out.println("<b>");
applyTemplates(node,null);
out.println("</b>");
}//end if

Listing 20

This code applies templates to the child nodes of nodes named G, and wraps the output from that operation in the XHTML tags for bold, <b>...</b>.

Compare this code to the template rule shown in Listing 7.

A match for elements Q, R, S, and T

Listing 21 shows custom code that applies to nodes named Q, R, S, and T.

//Create four levels of XHTML headers
else if(node.getNodeName() == "Q"){
out.println("<h1>");
applyTemplates(node,null);
out.println("</h1>");
}//end if

else if(node.getNodeName() == "R"){
out.println("<h2>");
applyTemplates(node,null);
out.println("</h2>");
}//end if

else if(node.getNodeName() == "S"){
out.println("<h3>");
applyTemplates(node,null);
out.println("</h3>");
}//end if

else if(node.getNodeName() == "T"){
out.println("<h4>");
applyTemplates(node,null);
out.println("</h4>");
}//end if

Listing 21

Similar blocks of code

The four blocks of code are very similar. Each block of code applies templates to the matching node type, and surrounds the output from that operation with the XHTML tags for a header, such as <h1>...</h1>. However, the size of the header differs from one to the next.

Compare the block of code in Listing 21 that matches Q with the template rule in Listing 3. Compare all four of the code blocks to the last four template rules in Figure 3.

Processing nodes with no match

This XML document contains several nodes for which there is no matching template in the stylesheet and no matching code block in this program, including the root element node named A.

Whenever the XSLT processor encounters an element node for which there is no matching template rule, it executes a built-in rule for element nodes.

When this program encounters an element node for which there is no matching code block in the element node case of the switch statement, it executes the code shown in Listing 22.

else{//invoke default behavior
defElOrRtNodeTemp(node);
}//end else
break;
}//end case ELEMENT_NODE

Listing 22

As you can see, the code in Listing 22 invokes the method named defElOrRtNoteTemp, passing the unmatched node as a parameter. This is a method that mimics the built-in behavior of the XSLT processor. I discussed it in detail in an earlier lesson, and won't repeat that discussion here.

The remainder of the processNode method

That completes the discussion of the case for elements nodes in the switch statement of the processNode method. That leaves the following cases not yet discussed:

Text nodes
Attribute nodes
Comment nodes
Processing instruction nodes
Namespace nodes (default case)

No new code for these nodes

However, there is no new code in the cases for these nodes in comparison with the code discussed in previous lessons. Therefore, I won't repeat that discussion in this lesson.

That completes the discussion of the processNode method, and leaves the following methods not yet discussed:

main
defTextOrArrrTemp
defElOrRtNodeTemp
defComOrProcInstrTemp
applyTemplates
valueOf
doXslTransform

However, these methods are identical to methods having the same name that I discussed in detail in earlier lessons. I won't repeat that discussion in this lesson.

The program output

The output produced by this program is essentially the same as the XSLT transform output discussed in the early part of the lesson. The output shown in rendered form in Figure 2, and in raw XHTML form in Figure 4 represents the output of both the program and the XSLT transform.

Run the Program

I encourage you to copy the Java code, XML file, and XSL file from the listings near the end of this lesson. Compile and execute the program. Experiment with the files, making changes, and observing the results of your changes.

Summary

In this lesson, I showed you how to use XSLT to transform an XML document into an XHTML document. I also showed you how to write Java code to perform the same transformation.

What's Next?

The next several lessons in this series will illustrate parallel Java code and XSLT transformations to transform XML documents into XHTML documents. The sample programs will illustrate various aspects of the manipulation of a DOM tree using Java code.

Complete Program Listings

Complete listings of the various files discussed in this lesson are contained in the listings that follow.

/*File Dom03.java
Copyright 2003 R.G.Baldwin

Ths program transforms an XML file into an XHTML
file using a combination of recursive and
iterative processing.

New material added to this lesson includes a
method that emulates an xsl:for-each template
rule.

This program compares the transformation of an
XML file to an XHTML file using two different
approaches:

1. An XSLT style sheet
2. Program code that emulates the behavior of the
XSL transformation.

Two XHTML files are produced, one by the XSL
transformation and one by the program code.

Both output files validate as XHTML at
http://validator.w3.org/file-upload.html

Both also validate as HTML at
http://www.htmlhelp.com/tools/validator/
upload.html

Both also validate using the program named
DomTree02.java,which means that they validate as
XML under JAXP.

The program requires three command line
parameters in the following order:
1, The name of the input XML file - must be
Dom03.xml
2. The name of the XHTML output file to be
produced by the XSL transformation.
3. The name of the XHTML output file to be
produced by the program code that emulates
the XSL transformation.

This program implements all six built-in default
template rules for an XML processor. In
addition, it implements several other
template rules that are required to support
the built in rules, such as xsl:value-of and
xsl:for-each.

The program creates several custom template
rules.

To create a custom temtlate rule:
1. Go to the processNode method.
2. Identify the node type.
3. Change the conditional clause in the if
statement to implement the match.
4. Write code in the body of the if statement to
implement the custom rule.

If the modified conditional clause evaluates to
true, the custom rule will be executed. If
false,the default rule will be executed.

In particular, this program illustrates Java code
that emulates the XSLT templates in the file
named Dom03.xsl.

The name of the XSL stylesheet file is extracted
from the processing instruction in the XML file.

The program begins by executing code to transform
the incoming XML file in a way that mimics the
XSL Transformation. Along the way, it saves the
processing instructions containing the ID of the
stylesheet file for use by the XSLT process
later. Otherwise, the code that performs the
XSL transformation later would have to search the
DOM tree for the XSL stylesheet file.

Then the program uses the XSLT style sheet to
transform the XML file into a result file.

This is not a general purpose program. This
program, and the XSLT file named Dom03.xsl
are specifically designed to transform the
contents of the file named Dom03.xml into an
XHTML file.

No effort was made to provide meaningful
information about errors and exceptions.

Tested with SDK 1.4.2 under WinXP.
************************************************/

import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.DocumentBuilder;

import org.w3c.dom.*;

import javax.xml.transform.Transformer;
import javax.xml.transform.TransformerFactory;
import javax.xml.transform.dom.DOMSource;
import javax.xml.transform.stream.*;

import java.util.*;
import java.io.*;

public class Dom03{

PrintWriter out;//output stream
//Save processing instruction nodes here
static Vector procInstr = new Vector();

public static void main(String argv[]){
if (argv.length != 3){
System.err.println(
"usage: java Dom03 "
+ "xmlFileIn "
+ "xformFileOut "
+ "codeFileOut");
System.exit(0);
}//end if

try{
//Get a factory object for DocumentBuilder
// objects
///
DocumentBuilderFactory factory =
DocumentBuilderFactory.newInstance();

//Configure the factory object. Change
// the following parameter to false for a
// non-validating parser.
///
factory.setValidating(true);
factory.setNamespaceAware(false);
//The following statement causes the parser
// to ignore cosmetic whitespace between
// elements.
///
factory.
setIgnoringElementContentWhitespace(true);

//Get a DocumentBuilder (parser) object
///
DocumentBuilder builder =
factory.newDocumentBuilder();

//Parse the XML input file to create a
// Document object that represents the
// input XML file.
///
Document document = builder.parse(
new File(argv[0]));

//Instantiate an object of this class
///
Dom03 thisObj = new Dom03();

//TRANSFORMATION THROUGH PROGRAM CODE
//Use program code to transform the
// DOM tree into an output file.
//
//Get an output stream for the output
// produced by the program code. This
// stream object is used by several
// methods, so it was instantiated at this
// point and saved as an instance variable
// of the object.
///
thisObj.out = new PrintWriter(
new FileOutputStream(argv[2]));

//Process the DOM tree, beginning with the
// Document node to produce the output.
// Invocation of processDocumentNode starts
// a recursive process that processes the
// entire DOM tree.
///
thisObj.processDocumentNode(document);

//XSLT TRANSFORMATION
//Use XSLT to transform the DOM tree into
// an output file. Note that the success
// of this method call depends on the
// stylesheet processing instruction having
// been saved while the transformation was
// being performed using program code
// above. Otherwise, it would be necessary
// to include the code in this method to
// search the DOM tree for the stylesheet
// processing instruction. All processing
// instructions are saved in a Vector
// object, which is passed as the third
// parameter to this method.
///
thisObj.doXslTransform(
document,argv[1],procInstr);

}catch(Exception e){
//Note that no effort was made to provide
// meaningful results in the event of an
// exception or error.
///
e.printStackTrace(System.err);
}//end catch
}// end main()
//-------------------------------------------//

//This method is used to produce any text
// required in the output at the document
// level, such as the XML declaration for an
// XML document.
void processDocumentNode(Node node){
//Create the beginning of the XHTML document
out.println("<?xml version=\"1.0\" "
+ "encoding=\"UTF-8\"?>");
out.println(
"<!DOCTYPE html PUBLIC \"-//W3C//DTD "
+ "XHTML 1.0 Transitional//EN\" "
+ "\"http://www.w3.org/TR/xhtml1/"
+ "DTD/xhtml1-transitional.dtd\">");
out.println("<html xmlns=\"http://www.w3."
+ "org/1999/xhtml\" xml:lang=\"en\""
+ " lang=\"en\">");
out.println("<head>");
out.println(
"<meta http-equiv=\"content-type\" "
+ "content=\"text/html; charset="
+ "UTF-8\"/>");
out.println("<title>Generated XHTML file"
+ "</title>");
out.println("</head>");
out.println("<body>");
//Output similar to the above applies to
// most XHTML documents.

//Now set up an XHTML table. This is
// peculiar to this particular example.
out.println("<table border=\"2\" " +
"cellspacing=\"0\" " +
"cellpadding=\"0\" " +
"width=\"330\" " +
"bgcolor=\"#FFFF00\">" +
"<tr><td>");

//Go process the root (document) node. This
// method call triggers a recursive process
// that processes the entire DOM tree.
processNode(node);

//Finish the XHTML table. This output is
// peculiar to this particular example.
out.println("</td></tr></table>");

//Now finish the output document and flush
// the output buffer. This would apply to
// most XHTML documents.
out.println("</body></html>");
out.flush();
}//end processDocumentNode
//-------------------------------------------//

//There are seven kinds of nodes:
// root or document
// element
// attribute
// text
// comment
// processing instruction
// namespace
//
//This method handles the first six.
// Apparently it is not possible to handle
// namespace nodes in Java because there is
// no constant in the Node class to identify
// namespace nodes
///
void processNode(Node node){

try{
if (node == null){
System.err.println(
"Nothing to do, node is null");
return;
}//end if

//Process the incoming node based on its
// type.
///
int type = node.getNodeType();

//To define an overriding template rule,
// insert the matching condition in the
// conditional clause of the if statement,
// and provide code to implement the rule
// in the body of the if statement. If the
// conditional clause evaluates to true,
// the default rule for that element type
// will not be processed.
///
switch (type){
case Node.TEXT_NODE:{
if(true){
out.println(node.getNodeValue());
}else{//invoke default behavior
//This won't be reached in this
// example, but I will leave it
// here as a reminder of the
// default behavior.
out.print(defTextOrAttrTemp(node));
}//end else
break;
}//end case Node.TEXT_NODE

case Node.ATTRIBUTE_NODE:{
if(false){
//Change conditional and write
// overriding handler here
///
}else{//invoke default behavior
out.print(defTextOrAttrTemp(node));
}//end else
break;
}//end case Node.ATTRIBUTE_NODE

case Node.ELEMENT_NODE:{
if(node.getNodeName() == "B"){
//Process all XML child nodes
// recursively
applyTemplates(node,null);
}//end if

else if(node.getNodeName() == "C"){
//Begin XHTML paragraph element
out.println("<p>");
//Process all XML child nodes
// recursively
applyTemplates(node,null);
//End XHTML paragraph element
out.println("</p>");
}//end if

else if(node.getNodeName() == "D"){
//First process the child nodes
// named E.
out.println("List of items in E");
//Begin XHTML unordered list
out.println("<ul>");
//Iteratively put text from E
// elements and their children into
// the list.
forEach(node,"E");
//End XHTML unordered list
out.println("</ul>");

//Now process the child nodes
// named F.
out.println("List of items in F");
//Begin XHTML ordered list
out.println("<ol>");
//Iteratively put text from F
// elements and their children in the
// list.
forEach(node,"F");
//End XHTML ordered list
out.println("</ol>");
}//end if

else if(node.getNodeName() == "G"){
//Display children as XHTML bold
out.println("<b>");
applyTemplates(node,null);
out.println("</b>");
}//end if

//Create four levels of XHTML headers
else if(node.getNodeName() == "Q"){
out.println("<h1>");
applyTemplates(node,null);
out.println("</h1>");
}//end if

else if(node.getNodeName() == "R"){
out.println("<h2>");
applyTemplates(node,null);
out.println("</h2>");
}//end if

else if(node.getNodeName() == "S"){
out.println("<h3>");
applyTemplates(node,null);
out.println("</h3>");
}//end if

else if(node.getNodeName() == "T"){
out.println("<h4>");
applyTemplates(node,null);
out.println("</h4>");
}//end if

//The following rules for E and F
// are invoked as a result of the
// behavior of the forEach method. The
// code could have been placed inside
// the forEach method. However, I
// elected to put it here in an attempt
// to confine all of the custom code
// to the methods named processNode and
// processDocumentNode.
else if(node.getNodeName() == "E"){
//Create an XHTML list item
// containing information from child
// nodes.
out.println("<li>");
applyTemplates(node,null);
out.println("</li>");
}//end if

else if(node.getNodeName() == "F"){
//Create an XHTML list item
// containing information from child
// nodes.
out.println("<li>");
applyTemplates(node,null);
out.println("</li>");
}//end if

else{//invoke default behavior
defElOrRtNodeTemp(node);
}//end else
break;
}//end case ELEMENT_NODE

case Node.DOCUMENT_NODE:{
if(false){
//Change conditional and write
// overriding handler here
///
}else{//invoke default behavior
defElOrRtNodeTemp(node);
}//end else
break;
}//end case DOCUMENT_NODE

case Node.COMMENT_NODE:{
if(false){
//Change conditional and write
// overriding handler here
///
}else{//invoke default behavior
defComOrProcInstrTemp(node);
}//end else
break;
}//end case COMMENT_NODE

case Node.PROCESSING_INSTRUCTION_NODE:{
if(false){
//Change conditional and write
// overriding handler here
}else{//invoke default behavior
//First save proc instr for later
// use.
procInstr.add(node);
//Now invoke default behavior.
defComOrProcInstrTemp(node);
}//end else
break;
}//end case PROCESSING_INSTRUCTION_NODE

default:{
//Ignore all other node types.
}//end default

}//end switch

}catch(Exception e){
e.printStackTrace(System.err);
}//end catch
}//end processNode(Node)
//-------------------------------------------//

//This method emulates the following default
// template rule:
// <xsl:template match="text()|@*">
// <xsl:value-of select="."/>
// </xsl:template>
///
String defTextOrAttrTemp(Node node)
throws Exception{
int nodeType = node.getNodeType();
if((nodeType == Node.ATTRIBUTE_NODE)
|| (nodeType == Node.TEXT_NODE)){
//Get and return the value of the context
// node.
///
return valueOf(node,".");
}else{
throw new Exception(
"Bad call to defaultTextOrAttr method");
}//end else
}//end defaultTextOrAttr
//-------------------------------------------//

//This method emulates the following default
// template rule:
// <xsl:template match="*|/">
// <xsl:apply-templates/>
// </xsl:template>
///
void defElOrRtNodeTemp(Node node)
throws Exception{
int nodeType = node.getNodeType();
if((nodeType == Node.ELEMENT_NODE) ||
(nodeType == Node.DOCUMENT_NODE)){
//Note that the following is a recursive
// method call.
///
applyTemplates(node,null);
}else{
throw new Exception(
"Bad call to defElOrRtNodeTemp");
}//end else
}//end defElOrRtNodeTemp
//-------------------------------------------//

//This method emulates the following default
// template rule:
// <xsl:template
// match="processing-instruction()|comment()"
///
String defComOrProcInstrTemp(Node node)
throws Exception{
int nodeType = node.getNodeType();
if((nodeType == Node.COMMENT_NODE) ||
(nodeType ==
Node.PROCESSING_INSTRUCTION_NODE)){
//According to page Nutshell pg 148, the
// default rule for comments and processing
// instructions doesn't output anything
// into the result tree.
///
return "";//empty string
}else{
throw new Exception("Bad call to " +
"defalutCommentOrProcInstrTemplate");
}//end else
}//end defComOrProcInstrTemp
//-------------------------------------------//

//See Nutshell, pg 148 for an explanation as to
// why it is not possible to write a Java
// method that emulates the default namespace
// template.
///
void defaultNamespaceTemplate(Node node)
throws Exception{
throw new Exception("See Nutshell pg 148" +
"regarding default behavior for " +
"namespace template.");
}//end defaultNamespaceTemplate
//-------------------------------------------//

//Simulates an XSLT apply-templates rule.
// <xsl:apply-templates
// optional select = "..."
// optional mode = "..."
// >
//Note that the mode attribute is not supported
// in this version.
//If the select parameter is null, all child
// nodes are processed.
void applyTemplates(Node node,String select){
NodeList children = node.getChildNodes();
if (children != null){
int len = children.getLength();
//Iterate on NodeList of child nodes.
for (int i = 0; i < len; i++){
if((select == null) ||
(select.equals(children.item(i).
getNodeName()))){
//Note that the following is a
// recursive method call.
///
processNode(children.item(i));
}//end if
}//end for loop
}//end if children != null

}//end applyTemplates
//-------------------------------------------//

//This method simulates an XSLT
// <xsl:value-of select="???"/>
// The general form of the method call is
// valueOf(Node theNode,String select)
//
//The method recognizes three forms of call:
// valueOf(Node theNode,String "@attrName")
// valueOf(Node theNode,String ".")
// valueOf(Node theNode,String "nodeName")
//
//In the first form, the method returns the
// text value of the named attribute of
// theNode. An attribute is specified by a
// select value that begins with @. If the
// attribte doesn't exist, the method returns
// an empty string.
//
//In the second form, the method returns the
// concatenated text values of descendants of
// the context node.
//
//In the third form, the method returns the
// concatenated text values of all descendants
// of a specified child node of the context
// node. If the context node has more than one
// child node with the specified name, only the
// first one found is processed. The others
// are ignored.
//
//The method does not support the following,
// which are standard features of xsl:value-of:
// disable-output-escaping
// processing instruction nodes
// comment nodes
// namespace nodes
///

public String valueOf(Node node,String select){

if(select != null
&& select.charAt(0) == '@'){
//This is a request for the value of an
// attribute. Returns empty string if the
// attribute doesn't exist on the element.
String attrName = select.substring(1);
NamedNodeMap attrList =
node.getAttributes();
Node attrNode = attrList.getNamedItem(
attrName);
if(attrNode != null){
return attrNode.getNodeValue();
}else{
return "";//empty string
}//end else
}//end if on @

else if(select != null
&& select.equals(".")){
//This is a request to process the context
// node
int nodeType = node.getNodeType();
if(nodeType == Node.ELEMENT_NODE){
//Process the context node as an element
// node. Return the concatenated text
// values of all descendants of the
// context node.
NodeList childNodes =
node.getChildNodes();
int listLen = childNodes.getLength();
String nodeTextValue = "";//result

for(int j = 0; j < listLen; j++){
nodeTextValue +=
valueOf(childNodes.item(j),".");
}//end for loop
return nodeTextValue;
}else if(nodeType == Node.TEXT_NODE){
//Process the context node as a text
// node. Simply get and return its
// value.
return node.getNodeValue();
}else{
//ignore all other context node types
}//end else
}//end if for context node

else if(select != null){
//Process a child node whose name is
// specified by the value of the incoming
// parameter named select. Get and return
// the concatenated text values of all
// descendants of the specified child node.
//This process assumes that there is only
// one child node with the specified name
// and processes the first one that it
// finds.
NodeList children = node.getChildNodes();
int len = children.getLength();
for (int i = 0; i < len; i++){
//Trap the specified child node
if(children.item(i).getNodeName().
equals(select)){
//Make a recursive call and let
// existing code do the work.
return valueOf(children.item(i),".");
//The above return statement causes any
// additional child nodes having the
// same name to be ignored.
}//end if getNodeName == select
}//end for loop on all child nodes
}//end else if(select != null)
//Will reach here only if value of select
// is null.
///
return "";//empty string
}//end method valueOf
//-------------------------------------------//

//This method simulates an XSLT for-each
// template rule
private void forEach(Node node,String select){
NodeList children = node.getChildNodes();
if (children != null){
int len = children.getLength();
//Iterate on NodeList of child nodes,
// processing nodes that match the select.

for (int i = 0; i < len; i++){
if(children.item(i).getNodeName().
equals(select)){
//Make a recursive call from within
// this iterative template rule.
processNode(children.item(i));
}//end if
}//end for loop
}//end if
}//end forEach
//-------------------------------------------//

//This method uses an incoming XSLT stylesheet
// file to transform an incoming Document
// object into an output file. Note that the
// successful invocation of this method depends
// on the processing instruction containing the
// stylesheet having been saved in a Vector
// object that is received as an incoming
// parameter. Otherwise, this method would
// have to search the DOM for the stylesheet
// processing instruction.
///
void doXslTransform(Document document,
String outFile,
Vector procInstr)
throws Exception{
try{
//Get stylesheet ID from proc instr.
ProcessingInstruction pi = null;
boolean piFlag = false;
int size = procInstr.size();
//Search for a stylesheet in the Vector
// containing processing instruction nodes.
///
for(int i = 0; i < size; i++){
pi = (ProcessingInstruction)procInstr.
get(i);
if(pi.getTarget().startsWith(
"xml-stylesheet") && pi.getData().
startsWith("type=\"text/xsl\"")){
//Looks like a good stylesheet.
///
piFlag = true;
break;
}//end if
}//end for loop
if(piFlag == false){//still false?
throw new Exception(
"No valid stylesheet");
}//end if
//Get the stylesheet file reference
///
String xslFile = pi.getData().
substring(pi.getData().indexOf(
"href=")+6);
//Eliminate the quotation mark at the end
///
xslFile = xslFile.substring(
0,xslFile.length()-1);

//Get a TransformerFactory object
///
TransformerFactory xformFactory =
TransformerFactory.newInstance();
//Get an XSL Transformer object based on
// the XSL file discovered above.
///
Transformer transformer =
xformFactory.newTransformer(
new StreamSource(
new File(xslFile)));
//Get a DOMSource object that represents
// the DOM tree.
///
DOMSource source = new DOMSource(document);

//Get an output stream for the output
// file.
///
PrintWriter xformStream = new PrintWriter(
new FileOutputStream(outFile));

//Get a StreamResult object that points to
// the output file. Then transform the DOM
// sending text to the output file.
///
StreamResult xformResult =
new StreamResult(xformStream);

//Do the transform
///
transformer.transform(source,xformResult);
}catch(Exception e){
e.printStackTrace(System.err);
}//end catch

}//end doXslTransform

}// class Dom03

Listing 23

NOTE: IT WAS NECESSARY TO MANUALLY ENTER SOME
LINE BREAKS INTO THIS DOCUMENT TO FORCE IT TO
FIT INTO THE NARROW PUBLICATION FORMAT.

<?xml version="1.0"?>



<!DOCTYPE A [
<!ELEMENT A (Q,B,B)*>
<!ELEMENT B (B | C | D | R | S | T)*>
<!ELEMENT C (#PCDATA)>
<!ELEMENT D (E | F)*>
<!ELEMENT E (#PCDATA | G)*>
<!ELEMENT F (#PCDATA)>
<!ELEMENT G (#PCDATA)>
<!ELEMENT Q (#PCDATA)>
<!ELEMENT R (#PCDATA)>
<!ELEMENT S (#PCDATA)>
<!ELEMENT T (#PCDATA)>
]>

<?xml-stylesheet type="text/xsl"
href="Dom03.xsl"?>

<A>
<Q>A Big Header</Q>

<B>
<C>Text block 1.</C>

<R>A Mid Header</R>

<C>Text block 2.</C>


<?processor ProcInstr="Dummy"?>

<S>A Small Header</S>
<B>
<C>Text block 3.</C>
</B>

<S>Another Small Header</S>
<B>
<C>Text block 4.</C>

<T>A Smallest Header</T>
<B>
<C>Text block 5.</C>

<D>
<E>First list item in E
<G>Nested G text element</G>
</E>
<F>First list item in F</F>
<E>Second list item in E</E>
<F>Second list item in F</F>
<E>Third list item in E</E>
<F>Third list item in F</F>
</D>

<C>Text block 6.</C>
</B>
<C>Text block 7.</C>
</B>

<R>Another Mid Header</R>
<C>Text block 8.</C>
</B>

<B>
<R>Another Mid Header in Another B</R>
<C>Text block 9.</C>
</B>
</A>

Listing 24

NOTE: IT WAS NECESSARY TO MANUALLY ENTER SOME
LINE BREAKS INTO THIS DOCUMENT TO FORCE IT TO
FIT INTO THE NARROW PUBLICATION FORMAT.

<?xml version='1.0'?>



<xsl:stylesheet
version="1.0"
xmlns:xsl="http://www.w3.org/1999
/XSL/Transform" >

<xsl:output method="xml"
doctype-public="-//W3C//DTD
XHTML 1.0 Transitional//EN"
doctype-system="http://www.w3.
org/TR/xhtml1/DTD/xhtml1-transitional.dtd" />


<xsl:template match="/">



<html>

<head>
<meta http-equiv="content-type"
content="text/html; charset=UTF-8"/>
<title>Generated XHTML file</title>
</head>

<body>

<table border="2" cellspacing="0"
cellpadding="0" width="330"
bgcolor="#FFFF00" >
<tr>
<td>

<xsl:apply-templates/>
</td>
</tr>
</table>
</body>

</html>

</xsl:template>


<xsl:template match="B">
<xsl:apply-templates />
</xsl:template>


<xsl:template match="C">
<p>
<xsl:apply-templates />
</p>
</xsl:template>


<xsl:template match="D">List of items in E
<ul>

<xsl:for-each select="E">
<li>
<xsl:apply-templates />
</li>
</xsl:for-each>

</ul>List of items in F
<ol>

<xsl:for-each select="F">
<li>
<xsl:apply-templates />
</li>
</xsl:for-each>

</ol>

</xsl:template>


<xsl:template match="G">
<b>
<xsl:apply-templates />
</b>
</xsl:template>



<xsl:template match="Q">
<h1>
<xsl:apply-templates />
</h1>
</xsl:template>


<xsl:template match="R">
<h2>
<xsl:apply-templates />
</h2>
</xsl:template>


<xsl:template match="S">
<h3>
<xsl:apply-templates />
</h3>
</xsl:template>


<xsl:template match="T">
<h4>
<xsl:apply-templates />
</h4>
</xsl:template>


</xsl:stylesheet>

Listing 25

About the author

Richard Baldwin is a college professor (at Austin Community College in Austin, TX) and private consultant whose primary focus is a combination of Java, C#, and XML. In addition to the many platform and/or language independent benefits of Java and C# applications, he believes that a combination of Java, C#, and XML will become the primary driving force in the delivery of structured information on the Web.

Richard has participated in numerous consulting projects, and he frequently provides onsite training at the high-tech companies located in and around Austin, Texas. He is the author of Baldwin's Programming Tutorials, which has gained a worldwide following among experienced and aspiring programmers. He has also published articles in JavaPro magazine.

Richard holds an MSEE degree from Southern Methodist University and has many years of experience in the application of computer technology to real-world problems.

Baldwin@DickBaldwin.com

-end-