Java JAXP, Writing Java Code to Emulate an XSLT Transformation

Baldwin shows you how to write a Java program that mimics an XSLT transformation for converting an XML file into a text file.  He shows that once you have a library of Java methods that emulate XSLT elements, it is no more difficult to write a Java program to transform an XML document than it is to write an XSL stylesheet to transform the same document.

Published:  June 1, 2004
By Richard G. Baldwin

Java Programming Notes # 2208


Preface

In the previous lesson entitled Java JAXP, Implementing Default XSLT Behavior in Java , I explained default XSLT behavior, and showed you how to write Java code that mimics default XSLT behavior.  The Java program named Dom11 that I developed in that lesson serves as a skeleton for more advanced transformation programs. 

This lesson updates Dom11 into a new program that tests and exercises several methods that were not tested by the samples used in the previous lesson.

I will show that once you have a library of Java methods that emulate XSLT elements, it is no more difficult to write a Java program to transform an XML document than it is to write an XSL stylesheet to transform the same document.

JAXP

JAXP is an API designed to help you write programs for creating and processing XML documents. It is a critical part of Sun's Java Web Services Developer Pack (JWSDP).

This lesson is one in a series designed to help you understand how to use JAXP and how to use the JWSDP.

The first lesson in the series was entitled Java API for XML Processing (JAXP), Getting Started .  The previous lesson was entitled Java JAXP, Implementing Default XSLT Behavior in Java.

XML

XML is an acronym for the eXtensible Markup Language.  I will assume that you already understand XML, and will teach you how to use JAXP to write programs for creating and processing XML documents.

XSL and XSLT

XSL is an acronym for Extensible Stylesheet language.  XSLT is an acronym for XSL Transformations.

The uses of XSLT include the following:
This lesson explains an XSLT transformation along with a Java program that transforms an XML document into a text file.

Viewing tip

You may find it useful to open another copy of this lesson in a separate browser window.  That will make it easier for you to scroll back and forth among the different listings and figures while you are reading about them.

Supplementary material

I recommend that you also study the other lessons in my extensive collection of online Java and XML tutorials.  You will find those lessons published at Gamelan.com.  As of the date of this writing, Gamelan doesn't maintain a consolidated index of my tutorial lessons, and sometimes they are difficult to locate there.  You will find a consolidated index at www.DickBaldwin.com.

Preview

A tree structure in memory

A DOM parser can be used to create a tree structure in memory that represents an XML document.  In Java, that tree structure is encapsulated in an object of the interface type Document.

Many operations are possible

Given an object of type Document (often called a DOM tree), there are many methods that can be invoked on the object to perform a variety of operations.  For example, it is possible to write Java code to:

Two ways to transform an XML document

There are at least two ways to transform the contents of an XML document into another document:

Advantages and disadvantages

As is usually the case, there are advantages and disadvantages to both approaches.

As an example of an advantage provided by XSLT, if it is possible to perform the required transformation using XSLT, that approach will probably require you to write less code than would be required to perform the same transformation by writing a Java program from scratch.  However, I will show that once you have a library of Java methods that emulate XSLT elements, it is no more difficult to write a Java program to transform an XML document than it is to write an XSL stylesheet to transform the same document.

Debugging XSLT can be difficult

In my opinion, it is much easier to debug a Java program than it is to debug an XSL stylesheet that doesn't work properly.  However, the use of a good XSLT debugger may resolve that difference.

Java provides more detailed control

I believe, (but cannot prove), that it is possible to write Java programs to do transformations that are not possible using standard XSLT elements.  If true, this may be an advantage of Java programs over XSLT transformations.

A skeleton library of Java methods

This is one of several lessons that show you how to write the skeleton of a Java library containing methods that emulate the most common XSLT elements.  Once you have the library, writing Java code to transform XML documents consists mainly of writing a short driver program to access and use those methods.  Thus, given the proper library of methods, it is no more difficult to write a Java program to perform the transformation than it is to write an XSLT stylesheet.

Library is not my primary purpose

However, my primary purpose in these lessons is not to provide such a library, but rather is to help you understand how to use a DOM tree to create, modify, and manipulate XML documents.  By comparing Java code that manipulates a DOM tree with similar XSLT operations, you will have an opportunity to learn a little about XSLT in the process of learning how to manipulate a DOM tree using Java code.

Some Details Regarding XSLT

Assume that an XML document has been parsed to produce a DOM tree in memory that represents the XML document.

An XSLT processor starts examining the DOM tree at its root node.  It obtains instructions from the XSLT stylesheet telling it how to navigate the tree, and how to treat each node that it encounters along the way.

Finding and applying matching template rules

As each node is encountered, the processor searches the stylesheet looking for a template rule that governs how to treat nodes of that type.  If the processor finds a template rule that matches the node type, it performs the operations indicated by the template rule.  If it doesn't find a matching template rule, it executes a built-in template rule appropriate to that node.  (I explained the behavior of the built-in template rules in the previous lesson.)

Literal text in the XSLT stylesheet elements

You can think of the XSLT process as operating on an input DOM tree to produce an output DOM tree.  If the template rule being applied contains literal text, that literal text is used to create text nodes in the output tree.

Traversing child nodes

An XPath expression can be used to point to a specific node and to establish that node as the context node.  Once a context node is established, there are at least two XSLT elements that can be used to traverse the children of that node:

The xsl:apply-templates element

The first of these, xsl:apply-templates, examines all child nodes of the context node that match an optional select attribute.  If the optional select attribute is omitted, then all child nodes of the context node are examined.

(When combined with a default template rule, this often results in a recursive examination and processing of all descendant nodes of the context node.)

Applying template rules

As each child node is examined, it is processed using a matching template rule or a built-in template rule.

Iterative operation

The second XSLT element in the above list, xsl:for-each, executes an iterative examination of all child nodes of the context node that match a required select attribute.  Note that unlike with the xsl:apply-templates element, the select attribute is not optional for this element.

The processor examines all child nodes of the context node that match the select attribute.  As each child node is examined, it is processed using a matching template rule or a built-in template rule.

Let's see some code

I will begin by discussing the XML file named Dom12.xml (shown in Listing 25 near the end of the lesson) along with the XSL stylesheet file named Dom12.xsl (shown in Listing 26).

A Java program named Dom12

After explaining the transformation produced by applying this stylesheet to this XML document, I will explain the transformation produced by processing the XML file with a Java program named Dom12 (shown in Listing 24) that mimics the behavior of the XSLT transformation.

Discussion and Sample Code

The XML file named Dom12.xml

The XML file shown in Listing 25 is relatively straightforward.  A tree view of the XML file is shown in Figure 1.  (This XML file is both well-formed and valid.)  I used alternating colors of red and blue to identify successive nodes named theData.  The reason for doing this will become apparent later.

#document DOCUMENT_NODE
  top DOCUMENT_TYPE_NODE
  #comment COMMENT_NODE
  #comment COMMENT_NODE
  dummy-target PROCESSING_INSTRUCTION_NODE
  xml-stylesheet PROCESSING_INSTRUCTION_NODE
  false-target PROCESSING_INSTRUCTION_NODE
  top ELEMENT_NODE
    theData ELEMENT_NODE
         Attribute: attr=Dummy Attr Value
       title ELEMENT_NODE
         #text Java
         subtitle ELEMENT_NODE
             Attribute: position=Low
           #text really
           part1 ELEMENT_NODE
             #text This is part 1
           part2 ELEMENT_NODE
             #text This is part 2
         #text rules      author ELEMENT_NODE
         #text R.Baldwin
       price ELEMENT_NODE
         #text $9.95
    theData ELEMENT_NODE
      title ELEMENT_NODE
        #text Python
      author ELEMENT_NODE
        #text R.Baldwin
      price ELEMENT_NODE
        #text $15.42
    theData ELEMENT_NODE
       title ELEMENT_NODE
         #text XML
       author ELEMENT_NODE
         #text R.Baldwin
       price ELEMENT_NODE
         #text $19.60

Figure 1

(This tree view of the XML file was produced using a program named DomTree02, which was discussed in an earlier lesson.  Note that in order to make the tree view more meaningful, I manually removed extraneous line breaks and text nodes associated with those line breaks.  The extraneous line breaks in Figure 1 were caused by extraneous line breaks in the XML file.  The extraneous line breaks in the XML file were placed there for cosmetic reasons and to force it to fit into this narrow publication format.)

A database of books

As you may already have figured out, this XML document represents a small database containing information about fictitious books.

It is important to note, however, that the structure and content of this XML file was not intended to have any purpose other than to illustrate the concepts being covered in this lesson.  In other words, some of the structure makes no sense with regard to a database containing information about books.

The XSLT Transformation

The XSL stylesheet file named Dom12.xsl

Recall that an XSL stylesheet is itself an XML file, and can therefore be represented as a tree.  Figure 2 presents an abbreviated tree view of the stylesheet shown in Listing 26.  I colored each of the five template rules in this view with alternating colors of red and blue to make them easier to identify visually.

(As is often the case with XSL stylesheets, this stylesheet file is well-formed but it is not valid.)

#document DOCUMENT_NODE
  xsl:stylesheet ELEMENT_NODE
      Attribute: xmlns:xsl=http:
                 //www.w3.org/1999/XSL/Transform
      Attribute: version=1.0
    xsl:template ELEMENT_NODE
         Attribute: match=/
       #textA Match Root
       xsl:apply-templates ELEMENT_NODE
           Attribute: select=top
    xsl:template ELEMENT_NODE
        Attribute: match=top
      #textB Match top
      xsl:apply-templates ELEMENT_NODE
          Attribute: select=theData
    xsl:template ELEMENT_NODE
         Attribute: match=theData
       #textC Match theData and show attribute
       xsl:value-of ELEMENT_NODE
           Attribute: select=@attr
       xsl:apply-templates ELEMENT_NODE
           Attribute: select=title
    xsl:template ELEMENT_NODE
        Attribute: match=title
      #text
D Match title and show value of title as context
      xsl:value-of ELEMENT_NODE
          Attribute: select=.
      #textE Show value of subtitle
      xsl:value-of ELEMENT_NODE
          Attribute: select=subtitle
      xsl:apply-templates ELEMENT_NODE
          Attribute: select=subtitle
    xsl:template ELEMENT_NODE
         Attribute: match=subtitle
       #text
 F match subtitle and show value of attribute
       xsl:value-of ELEMENT_NODE
           Attribute: select=@position
       #text
 G Show value of subtitle as context node
       xsl:value-of ELEMENT_NODE
           Attribute: select=.
Figure 2

Why abbreviated?

The reason that I refer to this as an abbreviated tree view is because I manually deleted comment nodes and extraneous text nodes in order to emphasize the important elements in the stylesheet.

(Extraneous text nodes occur as a result of inserting line breaks in the original XSL document for cosmetic purposes.  Note that I also manually entered a line break in the third line of Figure 2 to force the material to fit into this narrow publication format.)

The root element

The root node of all XML documents is the document node.  In addition to the root node, there is also a root element, and it is important not to confuse the two.

As you can see from Figure 2, the root element in the XSL document is of type xsl:stylesheet.  The root element has two attributes, each of which is standard for XSL stylesheets.

The first attribute points to the XSLT namespace URI, which you can read about in the W3C Recommendation.  The second attribute provides the XSLT version.

Children of the root element node

The root element node in Figure 2 has five child nodes, each of which is a template rule.  (I discussed template rules in detail in the previous lesson.)

Each of the five child nodes of the root node has a match pattern.  The five match patterns in the order that they appear in Figure 2 are as follows:

I will discuss each of the five template rules, but before doing that I will show you the output produced by this XSLT transformation.

(Note that the Java program discussed later produces essentially the same output as the XSLT transformation.)
The output from the transformation

The result of performing an XSLT transformation by applying the XSL stylesheet shown in Listing 26 to the XML file shown in Listing 25 is shown in Figure 3.

I will explain the operations in the XSLT transformation that produced each line of text in Figure 3.

<?xml version="1.0" encoding="UTF-8"?>

A Match Root

B Match top

C Match theData and show attribute
Dummy Attr Value
D Match title and show value of title as context
Java
really
This is part 1
This is part 2
rules
E Show value of subtitle
really
This is part 1
This is part 2

F match subtitle and show value of attribute
Low
G Show value of subtitle as context node
really
This is part 1
This is part 2

C Match theData and show attribute

D Match title and show value of title as context
Python
E Show value of subtitle

C Match theData and show attribute

D Match title and show value of title as context
XML
E Show value of subtitle

Figure 3

(Note that I manually deleted a couple of extraneous line breaks from the output shown in Figure 3.)
The first line of text

The first line of text in the output shown in Figure 3 is an XML declaration that is produced automatically by the XSLT transformer available with JAXP.

(Note however, that the existence of this line of text doesn't cause the document to be an XML document.  This document cannot be parsed as an XML document.  An attempt to do so results in various parser errors.)
The first template rule
The first template rule (extracted from Figure 2) is shown in tree view in Figure 4.  This template rule contains an XPath expression that matches the document root (note the forward slash).

    xsl:template ELEMENT_NODE
        Attribute: match=/
      #textA Match Root
      xsl:apply-templates ELEMENT_NODE
          Attribute: select=top

Figure 4

Listing 1 shows the same template rule in XSL format, (extracted from Listing 26).

<xsl:template match="/">
A Match Root
<xsl:apply-templates select="top" />
</xsl:template>

Listing 1

What is the effect of a literal text node?

This template rule contains a literal text node, which is highlighted in red in Figure 4 and Listing 1.

When an XSL stylesheet is used to perform an XSLT transformation on an XML file, any text nodes that exist in the XSL stylesheet are reproduced in the output tree.  As a result, the output contains the text shown in Figure 5 (extracted from the top of Figure 3 above).  Note that the text in the output matches the text node in the stylesheet.

A Match Root

Figure 5

<xsl:apply-templates select="top" />

Note that the context node at this point in the process is the document node.  The literal text node in Listing 1 is followed by an xsl:apply-templates element with a select attribute value of top.  This instructs the XSLT processor to search out all child nodes of the document node whose names are top, and to apply one the following template rules to each of those nodes:
Figure 1 shows that the root element node for the XML file is named top.  Since it is the root element node, there can be only one such node as a child of the document node.  That is the node that gets processed by the XSLT processor.

A template rule that matches top

The tree view fragment of the XSL file shown in Figure 6 shows that the stylesheet does contain a template rule that matches top

    xsl:template ELEMENT_NODE
        Attribute: match=top
      #textB Match top
      xsl:apply-templates ELEMENT_NODE
          Attribute: select=theData

Figure 6

(The template rule in Figure 6 was extracted from Figure 2.  It is the first blue template rule in Figure 2.)

Listing 2 shows the XSL code fragment that corresponds to the tree view of the template rule shown in Figure 6.

<xsl:template match="top">
B Match top
<xsl:apply-templates select="theData" />
</xsl:template>

Listing 2


Another literal text node


Once again, the template rule contains a literal text node, (highlighted in red), which passes through to the output shown in Figure 3.  You should be able to identify this literal text in the third line in the output shown in Figure 3 with no difficulty.

<xsl:apply-templates select="theData" />

At this point, the context node is the node named top.  This template rule also contains an xsl:apply-templates element immediately following the literal text.  In this case, the value of the select attribute is theData.

This element instructs the XSLT processor to search out all child nodes of top named theData and to apply one the following template rules to each of those child nodes:
Three child nodes named theData

Figure 1 shows that top has three child nodes named theData.

(I colored those three nodes in alternating colors of red and blue in Figure 1 to make them easier to identify.)

As you can see in Figure 1, the first node named theData is somewhat more complex than the other two nodes with the same name.  I purposely made it more complex to illustrate several concepts that I will cover in this lesson.

A template rule that matches theData

Referring back to the tree view in Figure 2, we see that the stylesheet does have a template rule that matches theData.  That fragment of the style sheet tree view is extracted from Figure 2 and reproduced in Figure 7 below.

    xsl:template ELEMENT_NODE
        Attribute: match=theData
      #textC Match theData and show attribute
      xsl:value-of ELEMENT_NODE
          Attribute: select=@attr
      xsl:apply-templates ELEMENT_NODE
          Attribute: select=title

Figure 7

The corresponding stylesheet code fragment is shown in Listing 3.  In both cases, a literal text node in the stylesheet is highlighted in red.

<xsl:template match="theData">
C Match theData and show attribute
<xsl:value-of select="@attr" />
<xsl:apply-templates select="title" />
</xsl:template>

Listing 3


Literal text in the output


As always, the text node in the template rule is reproduced in the output.  You should be able to identify this text in the fourth line of output text in Figure 3.

A more complex template rule

This template rule is a little more complex than those discussed previously.  In particular, this template rule has two XSLT elements following the literal text.

<xsl:value-of select="@attr" />

The first element following the literal text in Listing 3 is an element that instructs the XSLT processor to get the value of an XML attribute named attr (belonging to the context node) and to cause that value to become a text node in the output.

The item for which the value is to be obtained is specified by the value of the XSL attribute named select.  The fact that the value of the XSL attribute begins with @ specifies that the target is an attribute in the XML file belonging to the context node.

Following the execution thread

I am currently following the execution thread in discussing the transformation.  At this point in the process, the context node is the first XML node named theData.

Referring back to Figure 1, you can see that the first XML node named theData has an attribute named attr whose value is "Dummy Attr Value".

Figure 8 shows a recap of the output down to and including the value of the XML attribute named attr.  Note that only the value of the XML attribute appears in the output.  The name of the XML attribute does not appear in the output.

<?xml version="1.0" encoding="UTF-8"?>

A Match Root

B Match top

C Match theData and show attribute
Dummy Attr Value
...

Figure 8

<xsl:apply-templates select="title" />

The second element inside the template rule shown in Listing 3 instructs the XSLT processor to search for all nodes named title that are children of the context node.  As each such child node is encountered, the processor is to apply a template rule that matches title, or a built-in template rule if there is no matching template rule. A template rule that matches title

Referring back to the stylesheet tree view in Figure 2, we see that the stylesheet does have a template rule that matches title.  That fragment of the tree view was extracted from Figure 2 and is reproduced in Figure 9 below.

    xsl:template ELEMENT_NODE
        Attribute: match=title
      #text
D Match title and show value of title as context
      xsl:value-of ELEMENT_NODE
          Attribute: select=.
      #textE Show value of subtitle
      xsl:value-of ELEMENT_NODE
          Attribute: select=subtitle
      xsl:apply-templates ELEMENT_NODE
          Attribute: select=subtitle

Figure 9

The corresponding stylesheet code fragment

The corresponding stylesheet code fragment is shown in Listing 4.  Literal text nodes in the stylesheet are highlighted in red in both views.  Note that in this case there are two separate text nodes in the template rule separated by an xsl:value-of element.

<xsl:template match="title">
D Match title and show value of title as context
<xsl:value-of select="." />
E Show value of subtitle
<xsl:value-of select="subtitle" />
<xsl:apply-templates select="subtitle" />
</xsl:template>

Listing 4

You should have no difficulty identifying the result of the first text node in the sixth line of text in the output in Figure 3.

The template rule shown in Listing 4 is considerable more complex than those shown previously.

<xsl:value-of select="." />

This is the first XSLT element following the first text node in Listing 4.  A select value of "." specifies the context node, which in this case is an element named title(Note that my discussion is still following the thread of execution.)  As such, this will be the element named title belonging to the first XML element named theData in the XML document represented by the tree view in Figure 1.

I have extracted that tree view fragment of the XML document from Figure 1 and reproduced it in Figure 10 below with the XML text nodes highlighted in green.

      title ELEMENT_NODE
         #text Java
        subtitle ELEMENT_NODE
             Attribute: position=Low
           #text really
          part1 ELEMENT_NODE
             #text This is part 1
           part2 ELEMENT_NODE
             #text This is part 2
         #text rules

Figure 10

Get concatenated text values

As you will see shortly, this XSLT element instructs the processor to get (and send to the output) the concatenated text values of the context node and all of its descendant nodes.

The descendant nodes of the node named title in Figure 10 are:

Each of the descendant nodes (shown in Figure 10) contains a text node.  In addition, the node named title contains two separate text nodes, separated in the XML file by the node named subtitle

(The order of the text nodes and the descendant element nodes is important.)
Recap the output

Figure 11 shows a recap of the output up to this point in the execution thread, with the red output in Figure 11 matching the concatenated green text node values of title and all its descendants in Figure 10.

(Note that the order in which the text node values are concatenated matches the order in which the nodes occur in the XML document.)

<?xml version="1.0" encoding="UTF-8"?>

A Match Root

B Match top

C Match theData and show attribute
Dummy Attr Value
D Match title and show value of title as context
Java
really
This is part 1
This is part 2
rules
 ...

Figure 11

Another XSL text node

The next thing in the template rule shown in Listing 4 is another XSL text node, which will be reproduced in the output.  (This text node is also colored red in Listing 4.)  You should have no difficulty identifying this text node in the output in Figure 3.

<xsl:value-of select="subtitle" />

The second text node in Listing 4 is followed by another xsl:value-of element, but this time with a different value for the select attribute.  A select value of "subtitle" instructs the XSLT processor to get (and send to the output) the concatenated text values of a child node named subtitle and all of its descendants.

(The context node at this point is still the node named title, so the processor is looking for a node named subtitle as a child of title.

Although I haven't seen it written down anywhere, it is easy to demonstrate that if there are two or more child nodes with that name, only the first one found is processed.  The others are ignored.)

Figure 12 shows a fragment from Figure 1 showing the XML node named subtitle and its descendant nodes belonging to the first XML node named theData.  Once again, I colored the text node values green in Figure 12.

        subtitle ELEMENT_NODE
             Attribute: position=Low
           #text really
          part1 ELEMENT_NODE
             #text This is part 1
           part2 ELEMENT_NODE
             #text This is part 2

Figure 12

Recap the output

Figure 13 shows the output up to this point in the execution thread with the red output in Figure 13 corresponding to the concatenated green text node values in Figure 12.

<?xml version="1.0" encoding="UTF-8"?>

A Match Root

B Match top

C Match theData and show attribute
Dummy Attr Value
D Match title and show value of title as context
Java
really
This is part 1
This is part 2
rules
E Show value of subtitle
really
This is part 1
This is part 2
...

Figure 13

<xsl:apply-templates select="subtitle" />

The last XSLT element in the template rule in Listing 4 is an xsl:apply-templates element with the value of the select attribute being subtitle.

At this point in the execution stream, the context node is a node named title.  This element instructs the processor to search for all child nodes of title named subtitle.  As usual, when a matching node is found, one of the following two template rules will be applied to that node:
A template rule matching subtitle

The final template rule from Figure 2 is reproduced below.  This template rule matches subtitle.

    xsl:template ELEMENT_NODE
        Attribute: match=subtitle
      #text
F match subtitle and show value of attribute
      xsl:value-of ELEMENT_NODE
          Attribute: select=@position
      #text
G Show value of subtitle as context node
      xsl:value-of ELEMENT_NODE
          Attribute: select=.

Figure 14

(Note that even though I arranged the template rules in the stylesheet in the order that I wanted to discuss them, the order of the template rules in the stylesheet is immaterial.  I could completely rearrange them and the results would be the same.)
The corresponding stylesheet fragment

Listing 5 shows a fragment of the XSL stylesheet that corresponds to the tree view of the template rule in Figure 14.  Once again, in both cases, text nodes in the stylesheet are highlighted in red.

<xsl:template match="subtitle">
F match subtitle and show value of attribute
<xsl:value-of select="@position" />
G Show value of subtitle as context node
<xsl:value-of select="." />
</xsl:template>

Listing 5


You should have no difficulty identifying the first text node in Listing 5 as it appears in Figure 3.

<xsl:value-of select="@position">

The element following the first text node in Listing 5 is an xsl:value-of element that instructs the processor to get the value of an XML attribute named position belonging to the context node.  (I discussed an element like this earlier.)

Figure 1 shows this attribute to have a value of low in the subtitle node belonging to title node, which in turn belongs to the first node named theData.  The word low appears at the appropriate location in the output shown in Figure 3.

Another XSL text node

The next item in the template rule in Listing 5 is another XSL text node.  This text also appears at the appropriate location in the output in Figure 3.

<xsl:value-of select="." />

The last element in the template rule shown in Listing 5 instructs the processor to get the concatenated text value of the context node and all its descendants.  (I also discussed an element like this earlier.)

Continuing with the execution thread, the context node at this point is still the subtitle node belonging to title node, which in turn belongs to the first node named theData in Figure 1.  A tree view fragment of that node, extracted from Figure 1, is shown in Figure 15.  The text nodes belonging to subtitle, part1, and part2 are highlighted in green in Figure 15.

        subtitle ELEMENT_NODE
             Attribute: position=Low
           #text really
          part1 ELEMENT_NODE
             #text This is part 1
           part2 ELEMENT_NODE
             #text This is part 2

Figure 15

Recap the output

Figure 16 shows the output up to this point in the execution thread.  The concatenated text values highlighted in red in Figure 16 correspond to the text values highlighted in green in Figure 15.

<?xml version="1.0" encoding="UTF-8"?>

A Match Root

B Match top

C Match theData and show attribute
Dummy Attr Value
D Match title and show value of title as context
Java
really
This is part 1
This is part 2
rules
E Show value of subtitle
really
This is part 1
This is part 2

F match subtitle and show value of attribute
Low
G Show value of subtitle as context node
really
This is part 1
This is part 2
...

Figure 16

The same portion of the tree from different viewpoints

Figure 16 also shows some output text highlighted in blue that is identical to that highlighted in red.  (The blue text is output text that was discussed earlier.)

The blue output in Figure 16 was produced by the following XSLT element that appears in Listing 4 where the context node was title:
<xsl:value-of select="subtitle" />
The red output text in Figure 16 was produced by the following XSLT element that appears in Listing 5 where the context node was subtitle:
<xsl:value-of select="." />
Both XSLT elements refer to the same portion of the tree, but from different viewpoints.  The first XSLT element refers to the subtitle node from the viewpoint of its parent named title.  The second XSLT element refers to the subtitle node from the viewpoint of the subtitle node itself.

End of the recursion

Note that the template rule shown in Listing 5 contains only text nodes and xsl:value-of elements.  There are no xsl:apply-templates or xsl:for-each elements.  Thus, there are no instructions for the XSLT processor to continue drilling down into the depths of the DOM tree.  As a result, the recursive process works it way back toward the root of the tree.

The nodes named author and price

Referring back to Figure 1, we see that the first node named theData has two more child nodes that haven't been processed yet:
A tree view fragment showing those two nodes, extracted from Figure 1 is reproduced in Figure 17.

      author ELEMENT_NODE
         #text R.Baldwin
       price ELEMENT_NODE
         #text $9.95
Figure 17

What do they contribute to the output?

In order for these two nodes to contribute anything to the output, something in the XSL stylesheet must cause each of them to become the context node at some point in the process.

However, an examination of the five template rules in Figure 2 reveals that none of the template rules will cause either of these nodes to become the context node at any point in the process.  Therefore, they cannot contribute to the output.

Summary of the five template rules

The first template rule shown in Figure 2, Figure 4, and Listing 1 matches the root (document) node and causes templates to be applied to nodes named top.

The second template rule shown in Figure 2, Figure 6, and Listing 2 matches nodes named top and causes templates to be applied to nodes named theData.

The third template rule shown in Figure 2, Figure 7, and Listing 3 matches nodes named theData and causes templates to be applied to nodes named title.

(This might be the most likely place to find something in the stylesheet that would cause the nodes named author and price to become context nodes, but that doesn't happen.  The template rule that matches their parent, theData, simply ignores the child nodes named author and price.)
The fourth template rule shown in Figure 2, Figure 9, and Listing 4 matches title, (which is a sibling of the nodes named author and price), and causes templates to be applied to subtitle.

Finally, the fifth template rule shown in Figure 2, Figure 14, and Listing 5 matches subtitle and doesn't cause template rules to be applied to any other nodes.  Thus, it signals the end of the traversal down one leg of the DOM tree.

Not necessary to contribute to the output

Therefore, this XSLT transformation completely ignores the nodes named author and price, and they do not contribute anything to the output.

The main point is that it is not necessary for everything in an XML document to contribute to the output of an XSLT transformation.  The author of the stylesheet can pick and choose among the nodes in the DOM tree that will be used to produce nodes in the output tree.

Completes processing of first node named theData

That completes the processing of the first node in Figure 1 named theData.  Figure 16 shows all of the output produced by processing that node.

Referring back to Figure 6, we see an xsl:apply-templates element instructing the XSLT processor to apply templates to all nodes named theData that are children of the node named top.  So far, only one such node named theData has been processed.  Referring to Figure 1, we see that there are two more nodes named theData waiting to be processed.

The second node named theData

The second node named theData was extracted from Figure 1 and reproduced in Figure 18.

    theData ELEMENT_NODE
      title ELEMENT_NODE
        #text Python
      author ELEMENT_NODE
        #text R.Baldwin
      price ELEMENT_NODE
        #text $15.42
Figure 18

Comparing Figure 18 with the first node named theData in Figure 1 reveals that the second node named theData is much simpler than the first node named theData.  In particular, the title node in Figure 18 doesn't have any children, whereas the title node in Figure 10 has one child (subtitle) and two grandchildren (part1 and part2).

Furthermore, we also know by now that the nodes named author and price in Figure 18 will be completely ignored by the XSLT processor.

Won't explain the processing in detail

Given all of that, it shouldn't be necessary for me to explain the processing in detail for this node.  The processing proceeds as before, and produces the output shown in Figure 19.

C Match theData and show attribute

D Match title and show value of title as context
Python
E Show value of subtitle

...

Figure 19

A couple of things in Figure 19 are worthy of note.

No attribute named attr

To begin with, unlike the first node named theData, the second node named theData doesn't have an attribute named attr.  Therefore, unlike the output shown in Figure 16, the value of that attribute is blank in Figure 19.

(See the template rule in Figure 7 that selects the value of the attribute named attr.)
No subtitle, part1, or part2 descendants

Also, unlike the first node named theData, the second node named theData doesn't have descendants named subtitle, part1, or part2.  Therefore, all the output contributed by those descendant nodes to the output in Figure 16 is missing from Figure 19.

One more node named theData

An examination of Figure 1 shows that there is one more node named theData waiting to be processed.  However, except for the text values of the child nodes named title, author, and price, it is identical to the second node named theData, which was discussed above.  Therefore, a further discussion of the final node named theData is not warranted.

The Java Code Transformation

Now let's change direction and concentrate on Java code rather than XSLT elements.  The following paragraphs describe a Java program named Dom12, which emulates the XSLT transformation described above.

This program is an update of the program named Dom11 from the previous lesson.  This updated program is designed to test and exercise features of various methods that were not tested by the sample used with Dom11.

Mainly, this program adds code to the processNode method to simulate the template rules in the XSL file named Dom12.xsl.

Also, as was the case in the previous lesson, this program implements six built-in template rules for an XML processor.

Instructions for creating a custom template rule

To create a custom template rule for this program:
If the modified conditional clause evaluates to true, the custom rule will be executed.  If the modified conditional clause evaluates to false, the default rule will be executed.  You will see examples of several custom template rules in this program.

Behavior of the program

This program compares the transformation of a specified XML file into a result file, using two different approaches:
  1. An XSLT style sheet and transformation, as discussed above.
  2. Program code that emulates the behavior of the XSLT transformation.
In particular, this program illustrates Java code that emulates the XSLT templates in the file named Dom12.xsl.

Usage instructions

The program requires three command line arguments in the following order:
  1. The name of the input XML file - must be Dom12.xml.
  2. The name of the output file to be produced by the XSLT transformation.
  3. The name of the output file to be produced by the program code that emulates the XSLT transformation.
The name of the XSL stylesheet file is extracted from the processing instruction in the XML file, but you could easily modify the program to obtain the name of that file from a command-line argument.

Order of execution

The program begins by executing code to transform the incoming XML file in a way that mimics the XSLT Transformation.  Along the way, it saves the processing instructions containing the ID of the stylesheet file for use by the XSLT transformation process later.  Otherwise, the code that performs the XSLT transformation would have to search the DOM tree for the XSL stylesheet file.

Then the program uses the XSLT style sheet to transform the XML file into a result file by performing an XSLT transformation under program control.

Errors, exceptions, and testing

No effort was made to provide meaningful information about errors and exceptions.

The program was tested using SDK 1.4.2 under WinXP.

Will discuss in fragments

I will discuss this program in fragments.  A complete listing of the program is shown in Listing 24 near the end of the lesson.

Much of the code in this program is very similar to, or identical to code that I discussed in the previous lesson.  I will discuss that repetitious code only briefly, if at all.

The main method

Listing 6 shows an abbreviated version of the beginning of the class named Dom12 and the ending of the main method.

public class Dom12{
//Code deleted for brevity

//In main method
//Process the DOM tree
thisObj.processDocumentNode(document);

//Perform XSLT transformation
thisObj.doXslTransform(
document,argv[1],procInstr);

//Exception handling code deleted for brevity
}// end main()

Listing 6


The code in this portion of the program is identical to code that I discussed in detail in the previous lesson, so I won't discuss further.  I included it here solely to establish the context for discussion of code that is to follow. 

Behavior of this code

Briefly, the code in the main method does the following:
The methods named processDocumentNode and doXslTransform are methods of my own design.

The processDocumentNode method

The entire processDocumentNode method is shown in Listing 7.

  void processDocumentNode(Node node){
out.println("<?xml version=\"1.0\" "
+ "encoding=\"UTF-8\"?>");
out.println("A Match Root");

//Go process the root (document) node.
processNode(node);

out.flush();
}//end processDocumentNode

Listing 7


This method is used to produce any text required in the output at the document level, such as the XML declaration for an XML document.  As you can see from Listing 7, the code in this method writes an XML declaration into the output.

In addition, the code in Listing 7 produces output text that matches the literal text node in the XSL stylesheet shown in Figure 4 and Listing 1.

Both of these lines of text can be see near the top of the XSLT output in Figure 3.

Invoke the processNode method

Despite the name that I chose to give to the processDocumentNode method, it doesn't actually process the document node directly.  Rather after sending any required text to the output, it invokes the method named processNode to actually process the document node.

(Note that the Document object's reference is passed to the method named processNode in Listing 7.)
When the DOM tree has been processed ...

When the processNode method returns, (after the entire DOM tree has been processed), the processDocumentNode method flushes the output stream and returns control to the main method. 

As you saw in Listing 6, code in the main method then invokes the doXslTransform to cause an XSLT transformation using the stylesheet to take place.

The processNode method

As you learned in the previous lesson, there are seven possible types of nodes in an XML document:
  1. root or document node
  2. element node
  3. attribute node
  4. text node
  5. comment node
  6. processing instruction node
  7. namespace node
The processNode method handles the first six types and ignores namespace nodes.

(Apparently it is not possible to handle namespace nodes in a Java program because there is no constant in the Node class that can be used to identify namespace nodes.  This will become clear as we examine the code in the processNode method.)
Get and save the node type

The processNode method in this program contains quite a few changes relative to the program that I discussed in the previous lesson.  In fact, this is where most of the changes occur in this program.  (The only other change is the addition of one line of code to the processDocumentNode method.) Therefore, I will discuss the processNode method in detail.

Code that you write in this method (and in the processDocumentNode method discussed above) is somewhat analogous to writing an XSL stylesheet to be used in an XSLT transformation.

Test for a valid node, and get its type

The beginning of the processNode method is shown in Listing 8.  The method receives an incoming parameter of type Node, which can represent any of the seven types of nodes in the above list.

As you can see in Listing 8, if the parameter doesn't point to an actual object, the method quietly returns, as opposed to throwing a NullPointerException.

  void processNode(Node node){

try{
if (node == null){
System.err.println(
"Nothing to do, node is null");
return;
}//end if

//Get the actual type of the node
int type = node.getNodeType();

Listing 8

The final statement in Listing 8 invokes the getNodeType method to get and save the type of the node whose reference was received as an incoming parameter.

Process the node

Each time the processNode method is invoked, it receives a Node object's reference as an incoming parameter.  The code in Listing 8 determines the type of the incoming node.  Listing 9 shows the beginning of a switch statement that is used to initiate the processing of each incoming node based on its type.

      switch (type){
case Node.DOCUMENT_NODE:{
if(false){
//unreachable in this program
}else{//invoke default behavior
defElOrRtNodeTemp(node);
}//end else
break;
}//end case DOCUMENT_NODE

Listing 9

The switch statement has six cases to handle six types of nodes, plus a default case to ignore namespace nodes.

The DOCUMENT_NODE case

The code in Listing 9 will be executed whenever the incoming method parameter points to a document node.

(Note that this will happen only once during the processing of a DOM tree.  The first node processed will always be the document node, and there is only one document node in a DOM tree.)
DOCUMENT_NODE is a constant (public static final variable) that is defined in the Node interface.  (The interface provides similar constants for all node types other than namespace nodes.)  These constants can be used to distinguish between different node types.

Will invoke default behavior in this case

The code in the case in Listing 9 is an if else construct.  If the conditional clause in the if statement evaluates to true (which is not possible in this case because it is set to the literal value false), the code in the if statement will be executed.  (As you will see later, this is where I place the code for custom template rules.)

If the conditional clause in the if statement does not evaluate to true, the code in the else statement will be executed.  (This is where I have placed the code that mimics the built-in template rules.  This was explained in detail in the previous lesson.)

Note that the code in the else statement in Listing 9 invokes a method named defElOrRtNodeTemp.  The behavior of this method mimics one of the built-in template rules that I explained in the previous lesson.  That method has not changed since the previous lesson.  Therefore, I won't discuss it in this lesson.  You will find the method in Listing 24 near the end of this lesson.

Creating custom template rules

Although this lesson does not create a custom template rule for document nodes, the process for creating a custom template rule is as follows:
If the modified conditional clause evaluates to true, the custom template rule will be executed.  If it evaluates to false, the default rule will be executed.

The ELEMENT_NODE case

Most of the changes to this program (as compared to the program in the previous lesson) consist of changes to the code that processes element nodes in the switch statement.  The code for this case is rather long, so I will discuss it in fragments.

A match for element nodes named top

The beginning of the case for element nodes is shown in Listing 10.

        case Node.ELEMENT_NODE:{
if(node.getNodeName().equals("top")){
out.println("B Match top");
applyTemplates(node,"theData");

Listing 10


I will begin by calling your attention to the similarity between the code in Listing 10 and the XSLT template rule shown earlier in Figure 6 and Listing 2.

The if statement in Listing 10 returns true if the name of the element node being processed is top.  That corresponds to the XSLT match pattern in the first line in Listing 2.

The material shown in red in Listing 10 corresponds to the literal text shown in red in the XSLT template rule in Listing 2.

The invocation of the method named applyTemplates in Listing 10 corresponds to the xsl:apply-templates element in Listing 2.

The applyTemplates method

The only code in Listing 10 that is of any complexity is the invocation of the applyTemplates method.

The applyTemplates method in this program is identical to the method having the same name in the previous lesson.  I discussed the method in detail in that lesson.  Therefore, I won't discuss it further in this lesson.  However, an understanding of that method is critical to an understanding of this program.  If you haven't done so already, I strongly urge you to go back and review the previous lesson entitled Java JAXP, Implementing Default XSLT Behavior in Java .

A match for element nodes named theData

Continuing with the case for element nodes, the code in Listing 26 shows an else if clause that matches element nodes named theData.

(Note that this is an else if clause that follows the if statement begun in Listing 10.)

          }else if(node.getNodeName().equals(
"theData")){
out.println("C Match theData and "
+ "show attribute");
out.println(valueOf(node,"@attr"));
applyTemplates(node,"title");

Listing 11


Once again, I will point out the similarity of the code in Listing 11 to the XSLT template rule shown in Figure 7 and Listing 3.

This code will be executed for all element nodes named theData that are passed as an input parameter to the processNode method.  This code puts the text shown in red into the output just as the template rule puts the text shown in red in Listing 3 into the output.

This code invokes the valueOf method and the applyTemplates methods in a way that is very similar to the way the template rule executes the xsl: value-of element and the xsl:apply-templates element.

The valueOf method

The valueOf method in this program is identical to the method having the same name in the previous lesson.  However, this program uses portions of that method that I didn't discuss in the previous lesson.  Therefore, I will set the discussion of the switch statement in the processNode method aside temporarily, follow the thread of execution, and discuss the valueOf method in some detail in the paragraphs that follow.

Request value of attribute named attr

Note the parameters being passed to the valueOf method in listing 11.  The first parameter is a reference to the Node object being processed by the processNode method.  The second parameter is a String that begins with the @ character and continues with the characters attr.  As is the case for the template rule in Listing 3, this invocation of the valueOf method requests the value of the attribute named attr belonging to the node that is passed as the first parameter.

Description of the valueOf method

The valueOf method emulates the following XSLT element:
<xsl:value-of select="???"/>
The general form of the method call is:
valueOf(Node theNode,String select)
The valueOf method recognizes three forms of call based on the value of the select parameter:
Return the value of a named attribute

In the first form, the method returns the text value of the named attribute of the Node. An attribute is specified by a select value that begins with @. The name of the attribute follows the @ character in the string.  If the attribute doesn't exist, the method returns an empty string.

Return the value of the context node

In the second form, the method returns the concatenated text values of the context node and its descendants.  This form of call was discussed in detail in the previous lesson, so I will only mention it briefly in this lesson.

Return the value of a specified child of the context node

In the third form, the method returns the concatenated text values of a specified child node of the context node and its descendants. If the context node has more than one child node with the specified name, only the first one found is processed. The others are ignored.

I will discuss this form of method call later in the lesson when it occurs in the execution thread.

Method does not support ...

The valueOf method does not support the following standard features of xsl:value-of:
The valueOf method code

The beginning of the valueOf method is shown in Listing 12.

  public String valueOf(Node node,String select){

if(select != null
&& select.charAt(0) == '@'){

String attrName = select.substring(1);

Listing 12


The method begins by testing the incoming parameter to see if it starts with the @ character.  If so, the method call is interpreted as a request to return the value of an attribute belonging to the node specified by the first parameter.  The name of the attribute is specified by the characters following the @ character in the incoming string.

Get the attribute name

The code in Listing 12 uses the substring method of the String class to get the name of the attribute and to save it in the reference variable named attrName.

(As you will see shortly, if the attribute doesn't exist on that node, the method simply returns an empty string as the return value.)
Get the attribute node

Following this, the program executes the two statements in Listing 13 to access the attribute node and to save it in the reference variable named attrNode.

      NamedNodeMap attrList =
node.getAttributes();
Node attrNode = attrList.getNamedItem(
attrName);

Listing 13


A map of attribute nodes


Attribute nodes are not simply child nodes of element nodes.  In particular, all child nodes of an element node can be obtained in a collection of type NodeList by invoking the method named getChildNodes on the element node.

In order to get the attributes belonging to an element node, it is necessary to invoke the method named getAttributes on the element node.  This method returns a reference to an object of type NamedNodeMap.  This object contains unordered references to all the attribute nodes belonging to the element node.

Save the attribute node's reference

References to objects representing attribute nodes can be accessed in a NamedNodeMap object either on the basis of the attribute name, or on the basis of an ordinal index.

(Access by ordinal index is supported for convenience even though the references are unordered.  No ordering is implied by the ordinal index.)
The code in Listing 13 invokes the getNamedItem method on the NamedNodeMap object to retrieve the node specified by its name.  The attribute node's reference is returned as type Node and saved in the variable named attrNode.

Return value of attribute node

The code in Listing 14 invokes the getNodeValue method to get and return the value of the attribute node.

      if(attrNode != null){
return attrNode.getNodeValue();
}else{
return "";//empty string
}//end else
}//end if on @

Listing 14


If the context node doesn't have an attribute with that name, the value of attrNode will be null.  In that case, the valueOf method returns an empty string.

The remainder of the valueOf method

That completes the portion of the valueOf method used to return the value of an attribute.  Listing 15 shows the overall structure of the remainder of the valueOf method, to help you keep track of the big picture.  (Most of the code was deleted from Listing 15 for brevity.)

    else if(select != null
&& select.equals(".")){
//Process the context node
//Code deleted for brevity
}//end if for context node

else if(select != null){
//Process a selected child node
//Code deleted for brevity
}//end else if(select != null)

return "";//empty string
}//end method valueOf

Listing 15


I will return to a discussion of the valueOf method later in this lesson, at which time I will discuss some of the code that was deleted from Listing 15.

Back to the template rule

Please return your attention to Listing 11, which emulates the XSLT template rule shown in Listing 3.  When the valueOf method returns the value of the attribute named attr (or returns an empty string), the code in Listing 11 invokes the applyTemplates method to cause templates to be applied to theData's child nodes named title.

Once again, note the similarity of this code to the XSLT template rule shown in Listing 3.

Back to the switch statement

Control flows recursively through the applyTemplates method back to the element node case for the element named title in the switch statement in the processNode method.  That code begins in Listing 16.

          }else if(node.getNodeName().equals(
"title")){
out.println("D Match title and show "
+ "value of title as context");
out.println(valueOf(node,"."));

Listing 16


Note the similarity of this code and the beginning of the XSLT template rule shown in Listing 4.

By now, the code in Listing 16 should be very familiar to you and should require very little in the way of an explanation.  This code begins by sending a literal text string to the output.  Then it gets the value of the context node named title and sends that text to the output as well.  (A value of "." for the second parameter of the valueOf method requests the value of the context node.)

Invoke valueOf with select equal to subtitle

The remaining code that emulates the XSLT template rule shown in Listing 4 is shown in Listing 17.

            out.println(
"E Show value of subtitle");
out.println(valueOf(
node,"subtitle"));
applyTemplates(node,"subtitle");

Listing 17


This code begins by sending literal text to the output.  Then it invokes the valueOf method passing the name of the node named subtitle as the select parameter.  That brings us to a discussion of the one remaining portion of the valueOf method not previously discussed.

Overall structure of the valueOf method

Listing 18 shows a greatly condensed version of the two sections of the valueOf method that were discussed previously (one in this lesson and one in the previous lesson).  The code in Listing 18 is provided to help you understand the overall structure of the valueOf method and to keep track of the big picture.

  public String valueOf(Node node,String select){

if(select != null
&& select.charAt(0) == '@'){
//Request for the value of an attribute.
//Code deleted for brevity
}//end if on @

else if(select != null
&& select.equals(".")){
//Process the context node
//Code deleted for brevity
}//end if for context node

Listing 18


Return the value of a specified child node


Listing 19 shows that portion of the valueOf method that processes a child node whose name is specified by the value of the incoming parameter named select.  This code returns the concatenated text values of the specified child node and all of its descendants.

    else if(select != null){
NodeList children = node.getChildNodes();
int len = children.getLength();
for (int i = 0; i < len; i++){
//Trap the specified child node
if(children.item(i).getNodeName().
equals(select)){
//Make a recursive call
return valueOf(children.item(i),".");
}//end if getNodeName == select
}//end for loop on all child nodes
}//end else if(select != null)
return "";//empty string
}//end method valueOf

Listing 19

(This process assumes that there is only one child node with the specified name and processes the first one that it finds.  If there are additional child nodes having the same name, they are ignored.)
Comfortable with recursion?

Assuming that you are comfortable with recursion, the code in Listing 19 is relatively straightforward.  This code The value returned by the recursive call to valueOf is returned by the current call to the valueOf method.

I discussed the portion of the valueOf method that returns the value of the context node in the previous lesson, so I won't repeat that discussion here.

Back to the switch statement

Once again, that takes us back to the code in Listing 17, which emulates the latter portion of the XSLT template rule in Listing 4.  Note that upon return from valueOf, the code in Listing 17 invokes the applyTemplates method passing the name subtitle as the select parameter.

Control flows recursively through the applyTemplates method back to the element node case for the element named subtitle in the switch statement in the processNode method.  That code is shown in Listing 20.

          }else if(node.getNodeName().equals(
"subtitle")){
out.println("F match subtitle and "
+ "show value of attribute");
out.println(valueOf(
node,"@position"));
out.println("G Show value of "
+ "subtitle as context node");
out.println(valueOf(node,"."));

Listing 20


Compare the code in Listing 20 with the XSLT template rule in Listing 5.

Nothing new here

All of the code in Listing 20 is similar to code that I have already discussed in detail.  Therefore, not much in the way of further discussion should be needed.

No call to applyTemplates

However, there is one very important thing to note in Listing 20.  The code in Listing 20 does not make a call to applyTemplates.  Therefore, the code in Listing 20 signals the end of the recursive flow of control being used to traverse this leg of the DOM tree.  All of the methods that have been called recursively in order to get to this point in the DOM tree will start returning in the reverse of the order in which they were called.

Finish the case for Node.ELEMENT_NODE

Listing 21 shows the completion of the code for the element node case that began in Listing 10.  This code will be invoked if an element node is encountered with a name that does not match top or one of the node names in the sequential else if constructs discussed above.

The code in Listing 21 invokes a method named defElOrRtNodeTemp that emulates one of the built-in XSLT template rules.  This method and the methods that emulate the other built-in template rules were discussed in detail in the previous lesson.

          }else{//invoke default behavior
defElOrRtNodeTemp(node);
}//end else
break;
}//end case ELEMENT_NODE

Listing 21


The remainder of the processNode method


Listing 22 shows the remaining code in the processNode method.  All of the remaining cases in the switch statement invoke methods that emulate built-in XSLT template rules.

The code in listing 22 is identical to the same code in the previous lesson where it was discussed in detail.  Therefore, I won't discuss it further in this lesson.

      switch (type){

//Code extracted for detailed discussion

case Node.TEXT_NODE:{
if(false){
}else{//invoke default behavior
out.print(defTextOrAttrTemp(node));
}//end else
break;
}//end case Node.TEXT_NODE

case Node.ATTRIBUTE_NODE:{
if(false){
}else{//invoke default behavior
out.print(defTextOrAttrTemp(node));
}//end else
break;
}//end case Node.ATTRIBUTE_NODE

case Node.COMMENT_NODE:{
if(false){
}else{//invoke default behavior
defComOrProcInstrTemp(node);
}//end else
break;
}//end case COMMENT_NODE

case Node.PROCESSING_INSTRUCTION_NODE:{
if(false){
}else{//invoke default behavior
//Save proc instr for later use.
procInstr.add(node);
//Invoke default behavior.
defComOrProcInstrTemp(node);
}//end else
break;
}//end case PROCESSING_INSTRUCTION_NODE

default:{
//Ignore all other node types.
}//end default

}//end switch
}catch(Exception e){
e.printStackTrace(System.err);
}//end catch
}//end processNode(Node)

Listing 22


The program output

The output produced by this program is essentially the same as the XSLT transform output discussed in the early part of the lesson.  With some minor exceptions having to do with blank lines, the output shown in Figure 3 represents the output both of the program and the XSLT transform.

Compare with XSL stylesheet

To summarize the situation, I'm going to show you one more view of the new code in the program for comparison with the XSL stylesheet in Listing 26.

The code in Listing 23 plus the one red statement in Listing 7 is analogous to the stylesheet shown in Listing 26 from a functional viewpoint. 

case Node.ELEMENT_NODE:{
if(node.getNodeName().equals("top")){
out.println("B Match top");
applyTemplates(node,"theData");

}else if(node.getNodeName().equals("theData")){
out.println(
           "C Match theData and show attribute");
out.println(valueOf(node,"@attr"));
applyTemplates(node,"title");

}else if(node.getNodeName().equals("title")){
out.println(
      "D Match title and show value of title as "
                                    + "context");
out.println(valueOf(node,"."));
out.println("E Show value of subtitle");
out.println(valueOf(node,"subtitle"));
applyTemplates(node,"subtitle");

}else if(node.getNodeName().equals("subtitle")){
out.println(
 "F match subtitle and show value of attribute");
out.println(valueOf(node,"@position"));
out.println(
     "G Show value of subtitle as context node");
out.println(valueOf(node,"."));

}else{//invoke default behavior
defElOrRtNodeTemp(node);
}//end else
break;
}//end case ELEMENT_NODE

Listing 23


As you can see, the code in Listing 23 is no more complex than the stylesheet.  The point is that once you have a library of Java methods that emulate the required XSLT elements, it is no more difficult to write a Java program to transform an XML document than it is to write an XSL stylesheet to transform the same document.

Run the Program

I encourage you to copy the Java code, XML file, and XSL file from the listings near the end of this lesson.  Compile and execute the program.  Experiment with the files, making changes, and observing the results of your changes.

Summary

In this lesson, I showed you how to write a Java program that mimics an XSLT transformation for converting an XML file into a text file.  I showed that once you have a library of Java methods that emulate XSLT elements, it is no more difficult to write a Java program to transform an XML document than it is to write an XSL stylesheet to transform the same document.

What's Next?

In the next lesson, I will show you how to use XSLT to transform an XML document into an XHTML document.  I will also show you how to write Java code that performs the same transformation.

Complete Program Listings

Complete listings of the various files discussed in this lesson are contained in the listings that follow.

/*File Dom12.java
Copyright 2003 R.G.Baldwin

This is an update of Dom11 designed to test
features of various methods that were not
tested by the sample used with Dom11.

Mainly, this version adds code to the processNode
method to simulate the template rules in the
XSL file named Dom12.xsl.

Also, as before, this program implements all six
built-in template rules for an XML processor.

To create a custom template rule:
1. Go to the processNode method.
2. Identify the node type.
3. Change the conditional clause in the if
statement to implement the match.
4. Write code in the body of the if statement to
implement the custom rule.

If the modified conditional clause evaluates to
true, the custom rule will be executed. If
false, the default rule will be executed.

This program compares the transformation of a
specified XML file into a result file, using two
different approaches:

1. An XSLT style sheet and transformation.
2. Program code that emulates the behavior of the
XSL transformation.

In particular, this program illustrates Java code
that emulates the XSLT templates in the file
named Dom12.xsl.

The program requires three command line
parameters in the following order:
1, The name of the input XML file - must be
Dom12.xml.
2. The name of the output file to be
produced by the XSL transformation.
3. The name of the output file to be
produced by the program code that emulates
the XSL transformation.

The name of the XSL stylesheet file is extracted
from the processing instruction in the XML file.

The program begins by executing code to transform
the incoming XML file in a way that mimics the
XSL Transformation. Along the way, it saves the
processing instructions containing the ID of the
stylesheet file for use by the XSLT process
later. Otherwise, the code that performs the
XSL transformation later would have to search the
DOM tree for the XSL stylesheet file.

Then the program uses the XSLT style sheet to
transform the XML file into a result file.

No effort was made to provide meaningful
information about errors and exceptions.

Tested with SDK 1.4.2 under WinXP.
************************************************/

import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.DocumentBuilder;

import org.w3c.dom.*;

import javax.xml.transform.Transformer;
import javax.xml.transform.TransformerFactory;
import javax.xml.transform.dom.DOMSource;
import javax.xml.transform.stream.*;

import java.util.*;
import java.io.*;

public class Dom12{

PrintWriter out;//output stream
//Save processing instruction nodes here
static Vector procInstr = new Vector();

public static void main(String argv[]){
if (argv.length != 3){
System.err.println(
"usage: java Dom12 "
+ "xmlFileIn "
+ "xformFileOut "
+ "codeFileOut");
System.exit(0);
}//end if

try{
//Get a factory object for DocumentBuilder
// objects
///
DocumentBuilderFactory factory =
DocumentBuilderFactory.newInstance();

//Configure the factory object. Change
// the following parameter to false for a
// non-validating parser.
///
factory.setValidating(true);
factory.setNamespaceAware(false);
//The following statement causes the parser // to ignore cosmetic whitespace between
// elements.
///
factory.
setIgnoringElementContentWhitespace(true);

//Get a DocumentBuilder (parser) object
///
DocumentBuilder builder =
factory.newDocumentBuilder();

//Parse the XML input file to create a
// Document object that represents the
// input XML file.
///
Document document = builder.parse(
new File(argv[0]));

//Instantiate an object of this class
///
Dom12 thisObj = new Dom12();

//TRANSFORMATION THROUGH PROGRAM CODE
//Use program code to transform the
// DOM tree into an output file.
//
//Get an output stream for the output
// produced by the program code. This
// stream object is used by several
// methods, so it was instantiated at this
// point and saved as an instance variable
// of the object.
///
thisObj.out = new PrintWriter(
new FileOutputStream(argv[2]));

//Process the DOM tree, beginning with the
// Document node to produce the output.
// Invocation of processDocumentNode starts
// a recursive process that processes the
// entire DOM tree.
///
thisObj.processDocumentNode(document);


//XSLT TRANSFORMATION
//Use XSLT to transform the DOM tree into
// an output file. Note that the success
// of this method call depends on the
// stylesheet processing instruction having
// been saved while the transformation was
// being performed using program code
// above. Otherwise, it would be necessary
// to include the code in this method to
// search the DOM tree for the stylesheet
// processing instruction. All processing
// instructions are saved in a Vector
// object, which is passed as the third
// parameter to this method.
///
thisObj.doXslTransform(
document,argv[1],procInstr);

}catch(Exception e){
//Note that no effort was made to provide
// meaningful results in the event of an
// exception or error.
///
e.printStackTrace(System.err);
}//end catch
}// end main()
//-------------------------------------------//

//This method is used to produce any text
// required in the output at the document
// level, such as the XML declaration for an
// XML document.
///
void processDocumentNode(Node node){
//Write one line of text into the output.
///
out.println("<?xml version=\"1.0\" "
+ "encoding=\"UTF-8\"?>");
out.println("A Match Root");

//Go process the root (document) node. This
// method call triggers a recursive process
// that processes the entire DOM tree.
///
processNode(node);

out.flush();
}//end processDocumentNode
//-------------------------------------------//

//There are seven kinds of nodes:
// root or document
// element
// attribute
// text
// comment
// processing instruction
// namespace
//
//This method handles the first six.
// Apparently it is not possible to handle
// namespace nodes in Java because there is
// no constant in the Node class to identify
// namespace nodes
///
void processNode(Node node){

try{
if (node == null){
System.err.println(
"Nothing to do, node is null");
return;
}//end if

//Process the incoming node based on its
// type.
///
int type = node.getNodeType();

//To define an overriding template rule,
// insert the matching condition in the
// conditional clause of the if statement,
// and provide code to implement the rule
// in the body of the if statement. If the
// conditional clause evaluates to true,
// the default rule for that element type
// will not be processed.
///
switch (type){
case Node.TEXT_NODE:{
if(false){
//Change conditional and write
// overriding handler here
///
}else{//invoke default behavior
out.print(defTextOrAttrTemp(node));
}//end else
break;
}//end case Node.TEXT_NODE

case Node.ATTRIBUTE_NODE:{
if(false){
//Change conditional and write
// overriding handler here
///
}else{//invoke default behavior
out.print(defTextOrAttrTemp(node));
}//end else
break;
}//end case Node.ATTRIBUTE_NODE

case Node.ELEMENT_NODE:{
if(node.getNodeName().equals("top")){
out.println("B Match top");
applyTemplates(node,"theData");
}else if(node.getNodeName().equals(
"theData")){
out.println("C Match theData and "
+ "show attribute");
out.println(valueOf(node,"@attr"));
applyTemplates(node,"title");
}else if(node.getNodeName().equals(
"title")){
out.println("D Match title and show "
+ "value of title as context");
out.println(valueOf(node,"."));
out.println(
"E Show value of subtitle");
out.println(valueOf(
node,"subtitle"));
applyTemplates(node,"subtitle");
}else if(node.getNodeName().equals(
"subtitle")){
out.println("F match subtitle and "
+ "show value of attribute");
out.println(valueOf(
node,"@position"));
out.println("G Show value of "
+ "subtitle as context node");
out.println(valueOf(node,"."));
}else{//invoke default behavior
defElOrRtNodeTemp(node);
}//end else
break;
}//end case ELEMENT_NODE

case Node.DOCUMENT_NODE:{
if(false){
//Change conditional and write
// overriding handler here
///
}else{//invoke default behavior
defElOrRtNodeTemp(node);
}//end else
break;
}//end case DOCUMENT_NODE

case Node.COMMENT_NODE:{
if(false){
//Change conditional and write
// overriding handler here
///
}else{//invoke default behavior
defComOrProcInstrTemp(node);
}//end else
break;
}//end case COMMENT_NODE

case Node.PROCESSING_INSTRUCTION_NODE:{
if(false){
//Change conditional and write
// overriding handler here
///
//Save proc instr for later use
procInstr.add(node);
}else{//invoke default behavior
//First save proc instr for later
// use.
///
procInstr.add(node);
//Now invoke default behavior.
///
defComOrProcInstrTemp(node);
}//end else
break;
}//end case PROCESSING_INSTRUCTION_NODE

default:{
//Ignore all other node types.
}//end default

}//end switch

}catch(Exception e){
e.printStackTrace(System.err);
}//end catch
}//end processNode(Node)
//-------------------------------------------//

//This method emulates the following default
// template rule:
// <xsl:template match="text()|@*">
// <xsl:value-of select="."/>
// </xsl:template>
///
String defTextOrAttrTemp(Node node)
throws Exception{
int nodeType = node.getNodeType();
if((nodeType == Node.ATTRIBUTE_NODE)
|| (nodeType == Node.TEXT_NODE)){
//Get and return the value of the context
// node.
///
return valueOf(node,".");
}else{
throw new Exception(
"Bad call to defaultTextOrAttr method");
}//end else
}//end defaultTextOrAttr
//-------------------------------------------//

//This method emulates the following default
// template rule:
// <xsl:template match="*|/">
// <xsl:apply-templates/>
// </xsl:template>
///
void defElOrRtNodeTemp(Node node)
throws Exception{
int nodeType = node.getNodeType();
if((nodeType == Node.ELEMENT_NODE) ||
(nodeType == Node.DOCUMENT_NODE)){
//Note that the following is a recursive
// method call.
///
applyTemplates(node,null);
}else{
throw new Exception(
"Bad call to defElOrRtNodeTemp");
}//end else
}//end defElOrRtNodeTemp
//-------------------------------------------//

//This method emulates the following default
// template rule:
// <xsl:template
// match="processing-instruction()|comment()"
///
String defComOrProcInstrTemp(Node node)
throws Exception{
int nodeType = node.getNodeType();
if((nodeType == Node.COMMENT_NODE) ||
(nodeType ==
Node.PROCESSING_INSTRUCTION_NODE)){
//According to page Nutshell pg 148, the // default rule for comments and processing
// instructions doesn't output anything
// into the result tree.
///
return "";//empty string
}else{
throw new Exception("Bad call to " +
"defalutCommentOrProcInstrTemplate");
}//end else
}//end defComOrProcInstrTemp
//-------------------------------------------//

//See Nutshell, pg 148 for an explanation as to
// why it is not possible to write a Java
// method that emulates the default namespace
// template.
///
void defaultNamespaceTemplate(Node node)
throws Exception{
throw new Exception("See Nutshell pg 148" +
"regarding default behavior for " +
"namespace template.");
}//end defaultNamespaceTemplate
//-------------------------------------------//

//Simulates an XSLT apply-templates rule.
// <xsl:apply-templates
// optional select = "..."
// optional mode = "..."
// >
//Note that the mode attribute is not supported
// in this version.
//If the select parameter is null, all child
// nodes are processed.
void applyTemplates(Node node,String select){
NodeList children = node.getChildNodes();
if (children != null){
int len = children.getLength();
//Iterate on NodeList of child nodes.
for (int i = 0; i < len; i++){
if((select == null) ||
(select.equals(children.item(i).
getNodeName()))){
//Note that the following is a
// recursive method call.
///
processNode(children.item(i));
}//end if
}//end for loop
}//end if children != null

}//end applyTemplates
//-------------------------------------------//

//This method simulates an XSLT
// <xsl:value-of select="???"/>
// The general form of the method call is
// valueOf(Node theNode,String select)
//
//The method recognizes three forms of call:
// valueOf(Node theNode,String "@attrName")
// valueOf(Node theNode,String ".")
// valueOf(Node theNode,String "nodeName")
//
//In the first form, the method returns the
// text value of the named attribute of
// theNode. An attribute is specified by a
// select value that begins with @. If the
// attribte doesn't exist, the method returns
// an empty string.
//
//In the second form, the method returns the
// concatenated text values of descendants of
// the context node.
//
//In the third form, the method returns the
// concatenated text values of all descendants
// of a specified child node of the context
// node. If the context node has more than one
// child node with the specified name, only the
// first one found is processed. The others
// are ignored.
//
//The method does not support the following,
// which are standard features of xsl:value-of:
// disable-output-escaping
// processing instruction nodes
// comment nodes
// namespace nodes
///

public String valueOf(Node node,String select){

if(select != null
&& select.charAt(0) == '@'){
//This is a request for the value of an
// attribute. Returns empty string if the
// attribute doesn't exist on the element.
String attrName = select.substring(1);
NamedNodeMap attrList =
node.getAttributes();
Node attrNode = attrList.getNamedItem(
attrName);
if(attrNode != null){
return attrNode.getNodeValue();
}else{
return "";//empty string
}//end else
}//end if on @

else if(select != null
&& select.equals(".")){
//This is a request to process the context
// node
int nodeType = node.getNodeType();
if(nodeType == Node.ELEMENT_NODE){
//Process the context node as an element
// node. Return the concatenated text
// values of all descendants of the
// context node.
NodeList childNodes =
node.getChildNodes();
int listLen = childNodes.getLength();
String nodeTextValue = "";//result

for(int j = 0; j < listLen; j++){
nodeTextValue +=
valueOf(childNodes.item(j),".");
}//end for loop
return nodeTextValue;
}else if(nodeType == Node.TEXT_NODE){
//Process the context node as a text
// node. Simply get and return its
// value.
return node.getNodeValue();
}else{
//ignore all other context node types
}//end else
}//end if for context node

else if(select != null){
//Process a child node whose name is
// specified by the value of the incoming
// parameter named select. Get and return
// the concatenated text values of all
// descendants of the specified child node.
//This process assumes that there is only
// one child node with the specified name
// and processes the first one that it
// finds.
NodeList children = node.getChildNodes();
int len = children.getLength();
for (int i = 0; i < len; i++){
//Trap the specified child node
if(children.item(i).getNodeName().
equals(select)){
//Make a recursive call and let
// existing code do the work.
return valueOf(children.item(i),".");
//The above return statement causes any
// additional child nodes having the
// same name to be ignored.
}//end if getNodeName == select
}//end for loop on all child nodes
}//end else if(select != null)
//Will reach here only if value of select
// is null.
///
return "";//empty string
}//end method valueOf
//-------------------------------------------//

//This method uses an incoming XSLT stylesheet
// file to transform an incoming Document
// object into an output file. Note that the
// successful invocation of this method depends
// on the processing instruction containing the
// stylesheet having been saved in a Vector
// object that is received as an incoming
// parameter. Otherwise, this method would
// have to search the DOM for the stylesheet
// processing instruction.
///
void doXslTransform(Document document,
String outFile,
Vector procInstr)
throws Exception{
try{
//Get stylesheet ID from proc instr.
ProcessingInstruction pi = null;
boolean piFlag = false;
int size = procInstr.size();
//Search for a stylesheet in the Vector
// containing processing instruction nodes.
///
for(int i = 0; i < size; i++){
pi = (ProcessingInstruction)procInstr.
get(i);
if(pi.getTarget().startsWith(
"xml-stylesheet") && pi.getData().
startsWith("type=\"text/xsl\"")){
//Looks like a good stylesheet.
///
piFlag = true;
break;
}//end if
}//end for loop
if(piFlag == false){//still false?
throw new Exception(
"No valid stylesheet");
}//end if
//Get the stylesheet file reference
///
String xslFile = pi.getData().
substring(pi.getData().indexOf(
"href=")+6);
//Eliminate the quotation mark at the end
///
xslFile = xslFile.substring(
0,xslFile.length()-1);

//Get a TransformerFactory object
///
TransformerFactory xformFactory =
TransformerFactory.newInstance();
//Get an XSL Transformer object based on
// the XSL file discovered above.
///
Transformer transformer =
xformFactory.newTransformer(
new StreamSource(
new File(xslFile)));
//Get a DOMSource object that represents
// the DOM tree.
///
DOMSource source = new DOMSource(document);

//Get an output stream for the output
// file.
///
PrintWriter xformStream = new PrintWriter(
new FileOutputStream(outFile));

//Get a StreamResult object that points to
// the output file. Then transform the DOM
// sending text to the output file.
///
StreamResult xformResult =
new StreamResult(xformStream);

//Do the transform
///
transformer.transform(source,xformResult);
}catch(Exception e){
e.printStackTrace(System.err);
}//end catch

}//end doXslTransform

}// class Dom12

Listing 24


<?xml version="1.0"?>

<!DOCTYPE top [
<!ELEMENT top (theData)*>
<!ELEMENT theData (title,author,price)*>
<!ELEMENT title (#PCDATA | subtitle)*>
<!ELEMENT author (#PCDATA)>
<!ELEMENT price (#PCDATA)>
<!ELEMENT subtitle (#PCDATA | part1 | part2)*>
<!ELEMENT part1 (#PCDATA)>
<!ELEMENT part2 (#PCDATA)>
<!ATTLIST theData attr CDATA #IMPLIED>
<!ATTLIST subtitle position CDATA #IMPLIED>
]>

<!-- File Dom12.xml
Copyright 2003 R. G. Baldwin
-->

<!--Two of the following proc instr were included
to test the ability of the program to find the
actual stylesheet proc instr.-->
<?dummy-target dummy-data="def"?>
<?xml-stylesheet
type="text/xsl" href="Dom12.xsl"?>
<?false-target false-data="ghi"?>

<top>

<theData attr="Dummy Attr Value">
<title>Java
<subtitle position="Low">really
<part1>This is part 1</part1>
<part2>This is part 2</part2>
</subtitle>
rules</title>
<author>R.Baldwin</author>
<price>$9.95</price>
</theData>

<theData>
<title>Python</title>
<author>R.Baldwin</author>
<price>$15.42</price>
</theData>

<theData>
<title>XML</title>
<author>R.Baldwin</author>
<price>$19.60</price>
</theData>

</top>

Listing 25


<?xml version='1.0'?>
<!-- File Dom12.xsl
Copyright 2003 R. G. Baldwin
-->
<xsl:stylesheet
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
version="1.0">

<xsl:template match="/">
A Match Root
<xsl:apply-templates select="top" />
</xsl:template>

<xsl:template match="top">
B Match top
<xsl:apply-templates select="theData" />
</xsl:template>

<xsl:template match="theData">
C Match theData and show attribute
<xsl:value-of select="@attr" />
<xsl:apply-templates select="title" />
</xsl:template>

<xsl:template match="title">
D Match title and show value of title as context
<xsl:value-of select="." />
E Show value of subtitle
<xsl:value-of select="subtitle" />
<xsl:apply-templates select="subtitle" />
</xsl:template>

<xsl:template match="subtitle">
F match subtitle and show value of attribute
<xsl:value-of select="@position" />
G Show value of subtitle as context node
<xsl:value-of select="." />
</xsl:template>

</xsl:stylesheet>

Listing 26



Copyright 2004, Richard G. Baldwin.  Reproduction in whole or in part in any form or medium without express written permission from Richard Baldwin is prohibited.

About the author

Richard Baldwin is a college professor (at Austin Community College in Austin, TX) and private consultant whose primary focus is a combination of Java, C#, and XML. In addition to the many platform and/or language independent benefits of Java and C# applications, he believes that a combination of Java, C#, and XML will become the primary driving force in the delivery of structured information on the Web.

Richard has participated in numerous consulting projects, and he frequently provides onsite training at the high-tech companies located in and around Austin, Texas.  He is the author of Baldwin's Programming Tutorials, which has gained a worldwide following among experienced and aspiring programmers. He has also published articles in JavaPro magazine.

Richard holds an MSEE degree from Southern Methodist University and has many years of experience in the application of computer technology to real-world problems.

Baldwin@DickBaldwin.com

-end-