... in Java by Richard G Baldwin

Java JAXP, Implementing Default XSLT Behavior in Java

Baldwin explains default XSLT behavior, and shows you how to write Java code that mimics that behavior. The resulting Java code serves as a skeleton for more advanced transformation programs.

Published: February 17, 2004
By Richard G. Baldwin

Java Programming Notes # 2206

Preface
Preview
Some Details Regarding XSLT
Discussion and Sample Code
Run the Program
Summary
What's Next?
Complete Program Listings

Preface

In this lesson, I will explain default XSLT behavior, and will show you how to write Java code that mimics that behavior. The resulting Java code serves as a skeleton for more advanced transformation programs.

What is JAXP?

JAXP is an API designed to help you write programs for creating and processing XML documents. JAXP is very important for many reasons, not the least of which is the fact that it is a critical part of Sun's Java Web Services Developer Pack (JWSDP). As you are probably already aware, web services is expected by many to be a very important aspect of the Internet of the future

This lesson is one in a series designed to help you understand how to use JAXP and how to use the JWSDP.

The first lesson in this series was entitled Java API for XML Processing (JAXP), Getting Started. The previous lesson was entitled Java JAXP, Exposing a DOM Tree.

What is XML?

XML is an acronym for the eXtensible Markup Language. I will assume that you already understand XML, and will teach you how to use JAXP to write programs for creating and processing XML documents.

What are XSL and XSLT?

I provided quite a lot of background material on XSL and XSLT in a previous lesson in this series. A brief review of that material follows.

XSL is an acronym for Extensible Stylesheet language. XSLT is an acronym for XSL Transformations. The W3C is a governing body that has published many important documents on XML, XSL, and XSLT.

The uses of XSLT include the following:

Transforming non-XML documents into XML documents.
Transforming XML documents into other XML documents.
Transforming XML documents into non-XML documents.

Viewing tip

You may find it useful to open another copy of this lesson in a separate browser window. That will make it easier for you to scroll back and forth among the different listings and figures while you are reading about them.

Supplementary material

I recommend that you also study the other lessons in my extensive collection of online Java and XML tutorials. You will find those lessons published at Gamelan.com. As of the date of this writing, Gamelan doesn't maintain a consolidated index of my tutorial lessons, and sometimes they are difficult to locate there. You will find a consolidated index at www.DickBaldwin.com.

Preview

A tree structure in memory

A DOM parser can be used to create a tree structure in memory that represents an XML document. In Java, that tree structure is encapsulated in an object of the interface type Document. Document and its superinterface Node declare numerous methods that can be used to navigate, extract information from, modify, and otherwise manipulate the DOM tree. As is always the case, classes that implement Document must provide concrete definitions of those methods.

Many operations are possible

Given an object of type Document, there are many methods that can be invoked on the object to perform a variety of operations. For example, it is possible to write Java code to move nodes from one location in the tree to another location in the tree, thus rearranging the structure of the XML document represented by the Document object. It is possible to delete nodes, and to insert new nodes. It is also possible to recursively traverse the tree, extracting information about the nodes along the way.

Two ways to transform an XML document

There are at least two ways to transform the contents of an XML document into another document:

By writing Java code to manipulate the DOM and perform the transformation.
By using XSLT to perform the transformation.

It should be possible to write Java code to perform any transformation that can be performed using XSLT, but the reverse may not be true.

General description of XSLT

Here is a partial quotation from XML In A Nutshell, (which I highly recommend), by Elliotte Rusty Harold and W. Scott Means. This quotation provides a general description of XSLT:

"... (XSLT) is a functional programming language used to specify how an input XML document is converted into another text document -- possibly, though not necessarily, another XML document. An XSLT processor reads both an input XML document and an XSLT stylesheet (which is itself an XML document because XSLT is an XML application) and produces a result tree as output. ... Documents can be transformed using a standalone program or as part of a larger program that communicates with the XSLT processor through its API."

In this lesson, I will provide and explain a larger program that communicates with the XSLT processor through its API. The program will also execute Java code that mimics the transformation provided by XSLT.

Advantages and disadvantages

As is usually the case, there are advantages and disadvantages to both approaches to document transformation.

As an example of an advantage provided by XSLT, if it is possible to perform the required transformation using XSLT, that approach will probably require you to write less code than would be required to perform the same transformation by writing a Java program from scratch.

A large library of functions

With the XSLT transformation process, you write a stylesheet, which is somewhat analogous to a driver program in a more conventional programming environment. That driver program accesses and uses functions from a large library of pre-written functions to perform a series of well-defined operations on the DOM tree to produce the desired transformation.

(XSLT authors don't call them functions. Rather, they are called XSLT elements. According to XML In A Nutshell, there are 37 standard XSLT elements. Also according to XML In A Nutshell, most XSLT processors also provide various nonstandard extension elements and allow you to write your own extension elements in languages such as Java.)

Is there a similar library of Java methods?

I am not aware of a library of Java methods in the public domain that emulates the 37 standard XSLT Elements. However, I freely admit that such a library may exist and I may simply not know about it.

Therefore, to write a Java program that emulates an XSLT transformation, you need to either

Create your own library of Java methods and use that library with your Java code to perform the transformation, or
Start from scratch each time and write a custom program to perform the transformation.

A skeleton library of Java methods

This lesson, and several lessons to follow this one, will show you how to write the skeleton of a Java library containing methods that emulate the most common XSLT elements. Once you have the library, writing Java code to transform XML documents consists simply of writing a short driver program to access and use those methods. Thus, given the proper library of methods, it is no more difficult to write a driver Java program to perform the transformation than it is to write an XSLT stylesheet.

Library is not my primary purpose

However, my primary purpose in these lessons is not to provide such a library, but rather is to help you understand how to use a DOM tree to create, modify, and manipulate XML documents. By comparing Java code that manipulates a DOM tree with similar XSLT operations, you will have an opportunity to learn a little about XSLT in the process of learning how to manipulate a DOM tree using Java code.

If you already know a lot about XSLT, you may learn a little about Java by studying these lessons. If you already know a lot about Java, you may learn a little about XSLT. If you don't already know either Java or XSLT, you may learn a little about both.

Debugging XSLT can be difficult

While writing a Java program to emulate an XSLT Transformation may require you to write more code than writing a stylesheet, in my opinion, it is much easier to debug a Java program that fails to deliver the desired result than it is to debug an XSL stylesheet that fails to deliver. This is an advantage of using Java code over XSLT. I find XSLT to be extremely difficult to debug (but I haven't attempted to use a fancy XSLT debugger, several of which are freely available on the Internet).

Java provides more detailed control

Another difference in using Java code relative to XSLT has to do with the detailed control of the transformation process. I believe, (but cannot prove), that it is possible to write Java programs to provide transformations that are not possible using standard XSLT elements. If I am correct, this may be another advantage of writing Java code over using XSLT.

Some Details Regarding XSLT

The following is a partial quotation from XML In A Nutshell. (Note that I will be referring to this excellent book several more times in this lesson. For brevity, I will refer to it simply as Nutshell.)

"XSLT is an XML application for specifying rules by which one XML document is transformed into another XML document. An XSLT document -- that is, an XSLT stylesheet -- contains template rules. Each template rule has a pattern and a template. An XSLT processor compares the elements and other nodes in an input XML document to the template-rule patterns in a stylesheet. When one matches, it writes the template from that rule into the output tree. ... XSLT uses the XPath syntax to identify matching nodes."

My explanation

Let's see if I can explain this process in my own words. Assume that an XML document has been parsed so as to produce a DOM tree in memory that represents the XML document. (The creation of a DOM tree in this manner was discussed in several previous lessons in this series.)

An XSLT processor starts examining the DOM tree at its root node. It obtains instructions from the XSLT stylesheet telling it how to navigate the tree, and what to do with each node that it encounters along the way.

Finding matching template rules

As each node is encountered, the processor searches the stylesheet looking for instructions on how to treat that node. (These instructions will be referred to later as template rules.) If the processor finds instructions that match the node type, it performs the operations indicated by the instructions. If it doesn't find matching instructions, it executes built-in instructions appropriate to that node.

(An XML document can contain seven different types of nodes. The different types will be identified later. This lesson will describe and explain the built-in instructions for six of those seven node types. Java code will be developed that emulates the built-in instructions for each of the six types of nodes.)

Establishing the context node

An XPath expression can be used to point to a specific node and to establish that node as the context node. Once a context node is established, there are at least two XSLT elements that can be used to manage the traversal among children of that node:

xsl:apply-templates
select, optional attribute
mode, optional attribute
xsl:sort, optional XSLT element
xsl:for-each
select, required attribute
xsl:sort, optional XSLT element

The xsl:apply-templates XSLT element

The first of these, xsl:apply-templates, examines and processes all child nodes of the context node that match an optional select attribute.

(When combined with a default template rule to be discussed later, this often results in a recursive examination and processing of all descendant nodes of the context node.)

According to Nutshell,

"The xsl:apply-templates instruction tells the processor to search for and apply the highest-priority template in the stylesheet that matches each node identified by the select attribute."

Applying template rules

As each node is examined, the processor searches the stylesheet to determine if the XSLT programmer has provided a template rule that matches the node and defines how that node should be treated. If a matching template rule is found, the node is treated in the manner prescribed by the template rule.

Literal text in the XSLT stylesheet elements

You can think of the XSLT process as operating on an input DOM tree to produce an output DOM tree. If the template rule being applied contains literal text, that literal text is used to create a text node in the output tree.

(I will explain how this feature is used to transform XML documents into XHTML documents in a future lesson.)

If no match is found

If a matching template rule is not found, the processor executes a built-in template rule appropriate to the type of node involved. Built-in template rules are provided by the XSLT processor to handle the seven different types of nodes in an XML document:

root node
element node
attribute node
text node
comment node
processing instruction node
namespace node

This lesson will explain the built-in rules that handle the first six types of nodes in the above list.

Recursion is common

As mentioned earlier, the combination of xsl:apply-templates and a built-in template rule often produces recursion. Assuming that there is nothing in a matching template rule that stops the recursion operation, recursion continues until all descendant nodes of the original context node have been examined and processed.

The mode attribute

The mode attribute of xsl:apply-templates makes it possible to cause different template rules to match nodes of the same type at different places in the DOM tree.

Sorting

The optional xsl:sort element makes it possible to modify the order in which the nodes are examined.

Iterative operation

The second XSLT element in the above list, xsl:for-each, executes an iterative examination and processing of all child nodes of the context node that match the required select attribute. According to Nutshell,

"The xsl:for-each instruction iterates over the nodes identified by its select attribute and applies templates to each one."

In other words, the processor will examine all child nodes of the context node that match the select attribute. As each child node is examined, the processor will search the stylesheet looking for a template rule that matches the child node. If a matching template rule is found, the matching template rule will be used to process that node. If a matching template rule is not found, a built-in template rule appropriate for the type of node will be used to process the node.

As before, the optional xsl:sort element makes it possible to modify the order in which the nodes are examined. I will explain this in detail in a future lesson.

Combined operations

Frequently a stylesheet will combine recursive and iterative operations to produce more complex operations.

Enough talk, let's see some code

I will begin by discussing the XML file named Dom11.xml (shown in Listing 29) along with the XSL stylesheet file named Dom11.xsl (shown in Listing 30). These two listings are provided near the end of the lesson.

After explaining the transformation produced by applying this stylesheet to this XML document, I will explain the transformation produced by applying the empty stylesheet named Dom11a.xsl, (shown in Listing 33), to a nearly identical XML document.

(The two XML files are the same except that they refer to different stylesheet files, one of which is empty.)

A Java program named Dom11

Following that, I will explain a Java program (shown in Listing 31) that emulates the behavior of the stylesheets shown in Listings 30 and 33 when applied to the XML file shown in Listing 29.

I will explain that the Java program shown in Listing 31 emulates the behavior of the empty stylesheet shown in Listing 33, and will explain why that is true.

Discussion and Sample Code

The XML file named Dom11.xml

The XML file shown in Listing 29 is relatively straightforward. A tree view of that XML file is shown in Figure 1.

(The program named DomTree02, discussed in an earlier lesson, was used to produce this tree view of the XML file.

The values of the text nodes in Figure 1 were manually highlighted in red to make it easier to refer to those values later in this lesson.)

#document DOCUMENT_NODE
  top DOCUMENT_TYPE_NODE
  #comment COMMENT_NODE
  #comment COMMENT_NODE
  dummy-target PROCESSING_INSTRUCTION_NODE
  xml-stylesheet PROCESSING_INSTRUCTION_NODE
  false-target PROCESSING_INSTRUCTION_NODE
  top ELEMENT_NODE
    theData ELEMENT_NODE
        Attribute: attr=Dummy Attr Value
      title ELEMENT_NODE
        #text Java

        subtitle ELEMENT_NODE
            Attribute: position=Low
          #text really
        #text rules

      author ELEMENT_NODE
        #text R.Baldwin
      price ELEMENT_NODE
        #text $9.95
    theData ELEMENT_NODE
      title ELEMENT_NODE
        #text Python
      author ELEMENT_NODE
        #text R.Baldwin
      price ELEMENT_NODE
        #text $15.42
    theData ELEMENT_NODE
      title ELEMENT_NODE
        #text XML
      author ELEMENT_NODE
        #text R.Baldwin
      price ELEMENT_NODE
        #text $19.60

Figure 1

A database of books

As you may already have figured out, this XML document represents a small database containing information about books. However, the structure and content of this XML file was not intended to have any purpose other than to illustrate the default behavior of the built-in XSLT template rules.

The XSL stylesheet file named Dom11.xsl

The stylesheet file shown in Listing 30 is very important relative to the purpose of this lesson, so I will discuss it in detail.

Recall that an XSL stylesheet is itself an XML file, and can therefore be represented as a tree. I will begin by showing you an abbreviated version of a tree view of the stylesheet, as shown in Figure 2.

#document DOCUMENT_NODE
  xsl:stylesheet ELEMENT_NODE
      Attribute: xmlns:xsl=http:
                 //www.w3.org/1999/XSL/Transform
      Attribute: version=1.0

    xsl:template ELEMENT_NODE
        Attribute: match=*|/
      xsl:apply-templates ELEMENT_NODE

    xsl:template ELEMENT_NODE
        Attribute: match=text()|@*
      xsl:value-of ELEMENT_NODE
          Attribute: select=.

Figure 2

Why abbreviated?

The reason that I refer to this as an abbreviated version is because I manually deleted comment nodes and extraneous text nodes in order to emphasize the important elements in the document.

(Note that I also manually entered a line break in the third line to force the material to fit into this narrow publication format.)

The root element

The root node of all XML documents is the document node. However, in addition to the root node, there is also a root element.

As you can see from Figure 2, the root element in the XSL document is of type xsl:stylesheet. The root element has two attributes, each of which is standard for XSL stylesheets.

The first attribute points to the XSLT namespace URI, which you can read about in the W3C Recommendation. The second attribute provides the XSLT version. According to Nutshell, the version must be 1.0. Also, according to Nutshell,

"The namespace URI must be exactly correct. If even so much as a single character is wrong, the stylesheet processor will output the stylesheet itself instead of either the input document or the transformed input document."

Unable to verify this behavior

I have been unable to verify this behavior experimentally. When I delete a character from the XSL namespace URI and then load the XML file into IE 6.0, there is simply no output. The browser screen remains blank. When I modify the XSL namespace URI and attempt to use JAXP to apply the stylesheet to the XML file, the system throws several errors and the program aborts. Neither approach seems to "output the stylesheet itself" as indicated by Nutshell.

Children of the root element node

As you can see from Figure 2, the root element node has two child nodes, both of which are of type xsl:template. Here is what XSLT and XPath On The Edge by Jeni Tennison has to say about xsl:template:

"This element defines a template, which can be applied (if a match pattern is specified) or called (if a name is specified)."

As you can see from the attribute values in Figure 2, a match pattern is provided for both of the xsl:template nodes in Figure 2.

(The child nodes shown in Figure 2 are also called template rules.)

Back to basics

Getting back to XSLT basics, whenever the XSLT processor encounters a node while traversing the DOM tree, it will examine all of the template rules in the stylesheet searching for one whose match pattern matches the node. If it finds a matching template rule, it will execute the instructions contained as elements within the template rule. If it doesn't find a match, it will execute a built-in template rule that matches the node.

An explicit representation of a built-in template rule

Consider the first child node of the xsl:stylesheet root element in Figure 2. Listing 1 shows this template rule in XSL syntax, (extracted from Listing 30).

<xsl:template match="*|/">
<xsl:apply-templates/>
</xsl:template>

Listing 1

The template rule shown in Listing 1 is an explicit representation of one of the built-in template rules.

Matching the root node and element nodes

Consider the match pattern for this template rule (the text value of the attribute named match). According to Nutshell,

"The asterisk * is an XPath wild-card pattern that matches all element nodes, regardless of what name they have or what namespace they're in.

The forward slash / is an XPath pattern that matches the root node.

This is the first node the processor selects for processing, and therefore this is the first template rule the processor executes (unless a nondefault template rule also matches the root node).

... the vertical bar combines these two expressions so that it matches both the root node and element nodes."

The <xsl:apply-templates/> element

Now consider the <xsl:apply-templates/> element that makes up the body of this template rule. This element causes the processor to process all child nodes of each matching node, examining nodes, searching for matching template rules, and executing the elements embedded in matching template rules along the way. Again, according to Nutshell, still speaking of the template rule in Listing 1,

"In isolation, this rule means that the XSLT processor eventually finds and applies templates to all nodes except attribute and namespace nodes because every nonattribute, non-namespace node is either the root node, a child of the root node, or a child of an element. Only attribute and namespace nodes are not children of their parents."

An explicit representation of a built-in template rule

Once again, the template rule shown in Listing 1 is an explicit representation of one of the built-in template rules. If I were to remove this template rule from the stylesheet, and then apply the stylesheet to the XML document, this template rule would still be applied where appropriate by the XSLT processor, because it is built into the processor.

Handling text nodes by default

Listing 2 shows the template rule, in XSL syntax that corresponds to the second child node of the root element node in Figure 2. Once again, this is a template rule with a match pattern. This template rule is also an explicit representation of one of the built-in rules, which copies the value of text and attribute nodes into the output document.

<xsl:template match="text()|@*">
<xsl:value-of select="."/>
</xsl:template>

Listing 2

The match pattern

The text() in the value of the attribute named match is an XPath pattern matching all text nodes. The @* is an XPath pattern matching all attribute nodes. The vertical bar combines the two patterns. Hence, the template rule matches all text and all attribute nodes.

The xsl:value-of element

Once a match is made, the behavior of the rule is governed by the single element that is embedded in the rule. The xsl:value-of element, with a select value of "." returns the text value of the context or current node. (This is similar to the use of a single period to represent the current directory in some file management systems such as MSDOS.)

Text value to the output

Therefore, whenever the XSLT processor applies this template rule to a text or attribute node, the text value of that node is sent to the output document (a text node is created in the output tree).

If the node is a text node, the value is simply the text in the node.

If the node is an attribute node, the value is the attribute value, but not the attribute name.

The output

Now it's time for the big question. What does the output look like when the stylesheet shown in Listing 30 is used to transform the XML document shown in Listing 29? The result of such a transformation is shown in Figure 3.

(Note that I manually inserted a line break near the end of the fourth line in Figure 3 to force the material to fit in this narrow publication format. This caused the text $19.60 to move down to the fifth line.)

<?xml version="1.0" encoding="UTF-8"?>
Java
reallyrules
R.Baldwin$9.95PythonR.Baldwin$15.42XMLR.Baldwin
$19.60

Figure 3

The XML declaration

The first line in Figure 3 is an XML declaration that was placed there by the XSLT processor independent of the content of the XML file.

The text in the output

If you compare the text in Figure 3 with the material highlighted in red in Figure 1, you will see that the output produced by this stylesheet containing only explicit representations of default template rules is the concatenation of text values for all the element nodes in the XML document.

Line breaks in the output

The two line breaks following the words Java and rules in Figure 3 correspond to the line breaks in the text portion of the title element shown in Listing 3. (This element was extracted from the original XML file in Listing 29.)

<title>Java
<subtitle position="Low">really</subtitle>rules
</title>

Listing 3

Because these two line breaks occur within the text portion of the element, they also appear in the output in Figure 3. In other words, the line breaks are considered by the XSLT processor to be a legitimate part of the text content of the element.

The remaining line breaks in the XML file shown in Listing 29 occur between XML tags. Therefore, they are not considered to be a part of the text content of any element and they do not appear in Figure 3.

No attribute values in the output

You may have noticed that even though a couple of the elements in the XML file have attributes (see Figure 1), and one of the template rules matches attribute nodes, the attribute values do not appear in the output shown in Figure 3. Nutshell explains this in the following way:

"... although this template rule says what should happen when an attribute node is reached, by default the XSLT processor never reaches attribute nodes and, therefore, never outputs the value of an attribute."

Nutshell goes on to tell us,

"Attribute values are output according to this template only if a specific rule applies templates to them, and none of the default rules do this because attributes are not considered to be children of their parents. In other words, if element E has an attribute A, then E is the parent of A, but A is not the child of E."

Finally, Nutshell tells us,

"Applying templates to the children of an element with <xsl:apply-templates/>" does not apply templates to attributes of the element. To do that, the xsl:apply-templates element must contain an XPath expression specifically selecting attributes."

Applying an empty stylesheet

Now consider the stylesheet shown in Listing 33, as shown in abbreviated tree format in Figure 4.

(As was the case with Figure 2, comment nodes and extraneous text nodes were manually removed from Figure 4.)

#document DOCUMENT_NODE
  xsl:stylesheet ELEMENT_NODE
      Attribute: xmlns:xsl=http:
                 //www.w3.org/1999/XSL/Transform
      Attribute: version=1.0

Figure 4

Unlike Figure 2, the stylesheet represented by Figure 4 doesn't contain any template rules. In fact, except for the root (document) node and the xsl:stylesheet root element node, the stylesheet is completely empty.

Produces exactly the same output

However, the result of applying the empty stylesheet to the XML file discussed earlier produces exactly the same result as was produced by applying the stylesheet shown in Listing 30 and Figure 2 to that XML file.

This is because the two template rules shown in Listing 30 and Figure 2 replicate the behavior of two of the built-in template rules. Therefore, removing them from the stylesheet has no impact on the result produced by applying the stylesheet to the XML file. If they are needed, they are available as built-in rules of the XSLT processor.

Transformation behavior of an empty stylesheet

Because the two template rules in the previous stylesheet replicate the behavior of two of the built-in template rules, removing those template rules from the stylesheet to produce an empty stylesheet had absolutely no impact on the transformation result. The transformation result produced by the previous stylesheet was identical to those produced by the empty stylesheet.

According to Nutshell, when you transform an XML document using an empty stylesheet,

"... the output consists of a text declaration plus the text of the input document. ... Markup from the input document has been stripped. The net effect of applying an empty stylesheet ... to any input XML document is to reproduce the content but not the markup of the input document. To change that, we'll need to add template rules to the stylesheet telling the XSLT processor how to handle the specific elements in the input document. In the absence of explicit template rules, an XSLT processor falls back on built-in rules ..."

Combined output

Whenever the XSLT processor encounters a node for which you haven't defined a matching template rule, the default template rule for that type of node will be applied. Therefore, the total output is often a combination of output produced by template rules that you provide and built-in template rules.

Therefore, if you are going to create a stylesheet containing template rules of your own design, it is very important for you to understand the default behavior provided by the built-in template rules. The total output produced by your stylesheet is very likely to be a combination of the output produced by your template rules and the output produced by the built-in template rules.

Other built-in template rules

I have explained the behavior of the built-in template rules that cover the following four types of nodes:

root node
element node
attribute node
text node

I will explain the behavior of the built-in template rules that cover the following two types of nodes later in this lesson:

comment node
processing instruction node

I will also have some comments about namespace nodes later in this lesson as well.

A Java program that emulates the built-in template rules

Now let's change direction and concentrate on Java code rather than XSLT elements. The following paragraphs describe a Java program named Dom11.

The primary purposes of this lesson are to:

Demonstrate Java code that replicates the behavior of the built-in template rules for six of the seven possible types of nodes.
Provide a skeleton program that can be expanded later to provide more complex behavior.

This program implements six built-in template rules for an XML processor. In addition, it implements several other template rules that are required to support the built-in rules, such as xsl:value-of and xsl:apply-templates.

As such, the program serves as the skeleton for the definition of custom template rules.

Behavior of the program

As written, this program extracts and concatenates all text values from a specified XML file, and writes that text into a result file, using two different approaches:

An XSLT transformation operating under program control.
Program code that emulates the behavior of the XSLT transformation.

In particular, this program illustrates Java code that emulates the XSLT templates in the files named Dom11.xsl and Dom11a.xsl. These two XSL files differ in terms of their dependence on the built-in templates.

As you saw in the earlier discussion, both XSL files produce the same result when processed against the XML files named Dom11.xml and Dom11a.xml, demonstrating the behavior of the built-in template rules. The execution of these built-in template rules causes the contents of every text node to be concatenated and written into the result file.

The program code in this program emulates those built-in template rules and produces the same results.

Usage instructions

The program requires three command line arguments in the following order:

The name of the input XML file - must be Dom11.xml or Dom11a.xml.
The name of the output file to be produced by the XSLT transformation.
The name of the output file to be produced by the program code that emulates the XSLT transformation.

Order of execution

The program begins by executing code to transform the incoming XML file in a way that mimics the XSLT transformation. Along the way, it saves the processing instructions, (one of which contains the name of the stylesheet file), for later use by the code that governs the XSLT transformation process. (Otherwise, the code that performs the XSLT transformation later would have to search the DOM tree for the XSL stylesheet file name.)

The name of the XSL stylesheet file is extracted from the processing instruction in the XML file. Then the program uses the XSL style sheet to transform the XML file into a result file.

Errors, exceptions, and testing

No effort was made to provide meaningful information about errors and exceptions. If an error or exception occurs, the default behavior for that error or exception will occur.

The program was tested using SDK 1.4.2 under WinXP.

Will discuss in fragments

I will discuss this program in fragments. A complete listing of the program is shown in Listing 31 near the end of the lesson.

Listing 4 shows the beginning of the class named Dom11 and the beginning of the main method.

public class Dom11{

PrintWriter out;//output stream
//Save processing instruction nodes here
static Vector procInstr = new Vector();

public static void main(String argv[]){
if (argv.length != 3){
System.err.println(
"usage: java Dom11 "
+ "xmlFileIn "
+ "xformFileOut "
+ "codeFileOut");
System.exit(0);
}//end if

Listing 4

The code in Listing 4 declares a couple of variables, one of which will be used later to save processing instruction nodes.

Then the code in Listing 4 provides usage instructions based on command-line arguments.

Parse the input XML file

The code in Listing 5 parses the input XML file, producing an object of type Document, which is a DOM tree in memory.

try{
//Get a factory object for DocumentBuilder
// objects
DocumentBuilderFactory factory =
DocumentBuilderFactory.newInstance();

//Configure the factory object. Change
// the following parameter to false for a
// non-validating parser.
factory.setValidating(true);
factory.setNamespaceAware(false);
//The following statement causes the parser
// to ignore cosmetic whitespace between
// elements.
factory.
setIgnoringElementContentWhitespace(true);

//Get a DocumentBuilder (parser) object
DocumentBuilder builder =
factory.newDocumentBuilder();

//Parse the XML input file to create a
// Document object that represents the
// input XML file.
Document document = builder.parse(
new File(argv[0]));

Listing 5

Steps for creating a Document object

There is nothing new in the code in Listing 5. I have discussed the code required to create a Document object in several previous lessons beginning with the lesson entitled Java API for XML Processing (JAXP), Getting Started.

As you saw in those earlier lessons, creating a Document object involves three steps:

Create a DocumentBuilderFactory object
Use the DocumentBuilderFactory object to create a DocumentBuilder object
Use the DocumentBuilder object to create a Document object

Both the DocumentBuilderFactory class and the DocumentBuilder class belong to the javax.xml.parsers package. As of this writing, this package is part of J2SE 1.4.2.

Transformation through program code

The code in Listing 6 begins the process of transforming the DOM tree into an output file through the execution of program code (as opposed to an XSLT transformation).

The code begins by instantiating a new object of the Dom11 class.

Dom11 thisObj = new Dom11();

thisObj.out = new PrintWriter(
new FileOutputStream(argv[2]));

Listing 6

Get an output stream

Then the program gets an output stream for the output produced by the program code. This stream points to an output file that was specified by the third command- line parameter.

Process the DOM tree

The code in listing 7 invokes the processDocumentNode method to process the DOM tree. This method (and the methods that it calls) begins with the Document node, and processes all the nodes in the DOM tree to produce the required output.

thisObj.processDocumentNode(document);

Listing 7

Note that the code in listing 7 passes the Document object's reference to the method named processDocumentNode. This is the root node of the entire DOM tree, and can be treated as type Node, because the Document interface extends the Node interface.

Set the main method aside

My explanation of this program will follow the execution thread through the program. At this point, I will set the discussion of the main method aside temporarily and come back to it later when the processDocumentNode method returns control to the main method.

The processDocumentNode method

The entire processDocumentNode method is shown in Listing 8.

void processDocumentNode(Node node){
//Write one line of text into the output.
out.println("<?xml version=\"1.0\" "
+ "encoding=\"UTF-8\"?>");

processNode(node);

out.flush();
}//end processDocumentNode

Listing 8

This method is used to produce any text required in the output at the document level, such as the XML declaration for an XML document. (As you can see from Listing 8, the code in this method writes an XML declaration into the output.)

Invoke the processNode method

Despite the name that I chose to give to the processDocumentNode method, it doesn't actually process the document node directly. Rather after sending any required text to the output, it invokes the method named processNode to actually process the document node.

(Note that the Document object's reference is passed to the method named processNode in Listing 8.)

When the DOM tree has been processed ...

When the processNode method returns, (after the entire DOM tree has been processed), the processDocumentNode method flushes the output stream and returns control to the main method.

As you will see later, subsequent code in the main method invokes a method that will perform an XSLT transformation on the XML file and write the output into a different output file. I will discuss that method later in this lesson.

The processNode method

There are seven possible types of nodes in an XML document:

root or document node
element node
attribute node
text node
comment node
processing instruction node
namespace node

The processNode method handles the first six types and ignores namespace nodes.

(Apparently it is not possible to handle namespace nodes in a Java program because there is no constant in the Node class that can be used to identify namespace nodes. This will become clearer later as we examine the code in the processNode method.)

Get and save the node type

The beginning of the processNode method is shown in Listing 9. Note that the method receives an incoming parameter, which is a reference to an object as type Node. This can include any of the seven node types that can occur in a DOM tree.

If the parameter doesn't point to an actual object, the method simply returns, as opposed to throwing a NullPointerException.

void processNode(Node node){

try{
if (node == null){
System.err.println(
"Nothing to do, node is null");
return;
}//end if

//Process the incoming node based on its
// type.
int type = node.getNodeType();

Listing 9

The final statement in Listing 9 invokes the getNodeType method to get and save the type of the node whose reference was received as an incoming parameter.

Process the node

Each time the processNode method is invoked, it receives a Node object's reference as an incoming parameter. The code in Listing 9 determines the type of the incoming node. Listing 10 shows the beginning of a switch statement that is used to initiate the processing of each incoming node based on its type.

switch (type){
case Node.DOCUMENT_NODE:{
if(false){
//cannot be reached in this example
}else{//invoke default behavior
defElOrRtNodeTemp(node);
}//end else
break;
}//end case DOCUMENT_NODE

Listing 10

The switch statement has six cases to handle six types of nodes, plus a default case to ignore namespace nodes.

The DOCUMENT_NODE case

The code in Listing 10 will be executed whenever the incoming method parameter points to a document node.

(Note that this will happen only once during the processing of a DOM tree. The first node processed will always be the document node, and there is only one document node in a DOM tree.)

DOCUMENT_NODE is a constant (public static final variable) that is defined in the Node interface. (The interface provides similar constants for all node types other than namespace nodes.) These constants can be used to distinguish between different node types.

Will invoke default behavior in this case

Note that the code in the case in Listing 10 is an if/else construct. If the conditional clause in the if statement evaluates to true (which is not possible in this case), the code in the if statement will be executed. (This is where I will place the code for custom template rules in subsequent lessons.)

If the conditional clause in the if statement does not evaluate to true, the code in the else statement will be executed. (This is where I have placed the code that mimics the built-in template rules.)

Note that the code in the else statement in Listing 10 invokes a method named defElOrRtNodeTemp. When I discuss this method momentarily, you will see that its behavior mimics one of the built-in template rules that I discussed earlier in this lesson. Before getting to that, however, I want to give you a preview of how I will define custom template rules in future lessons.

Creating custom template rules

As you will see in subsequent lessons, the process for creating a custom template rule is as follows:

Go to the method named processNode, which I am discussing right now.
Identify the case for the node type in the switch statement.
Change the conditional clause in the if statement for that case to implement a match for a particular node of that type.
Write code in the body of the if statement to implement the custom template rule.

If the modified conditional clause evaluates to true, the custom template rule will be executed. If false, the default rule will be executed.

The ELEMENT_NODE case

Before getting to the discussion of the method named defElOrRtNodeTemp, I want to show you the ELEMENT_NODE case in Listing 11.

(This is still part of the switch statement that was begun in Listing 10)

case Node.ELEMENT_NODE:{
if(false){
//unreachable in this example
}else{//invoke default behavior
defElOrRtNodeTemp(node);
}//end else
break;
}//end case ELEMENT_NODE

Listing 11

Except for the type of node in the first line in Listing 11, the code in this case is identical to the code in the DOCUMENT_NODE case shown in Listing 10. Note in particular that the default behavior for this case invokes the same method as the default behavior for the document node case.

As before, the code in the if statement is not reachable in this program.

(That will be true for every case in this program, because this program is designed specifically to exhibit the same behavior as the built-in XSLT template rules.)

The method named defElOrRtNodeTemp

Still following the execution thread, I will set my discussion of the switch statement aside temporarily and discuss the method named defElOrRtNodeTemp. As mentioned above, this method is invoked as the default behavior for document nodes and element nodes in Listings 10 and 11.

I will return to my discussion of the switch statement shortly.

The entire method named defElOrRtNodeTemp is shown in Listing 12.

void defElOrRtNodeTemp(Node node)
throws Exception{
int nodeType = node.getNodeType();
if((nodeType == Node.ELEMENT_NODE) ||
(nodeType == Node.DOCUMENT_NODE)){

applyTemplates(node,null);
}else{
throw new Exception(
"Bad call to defElOrRtNodeTemp");
}//end else
}//end defElOrRtNodeTemp

Listing 12

Behavior of the method named defElOrRtNodeTemp

This method mimics the behavior of the built-in XSLT template rule shown in Listing 1, and repeated in Figure 5 below for convenient viewing.

  <xsl:template match="*|/">
    <xsl:apply-templates/>
  </xsl:template>

Figure 5

As I indicated earlier, the match pattern for this template rule matches the document node and all element nodes.

(Hence, this method is invoked by the two cases in the switch statement corresponding to the document node and an element node.)

Code is straightforward

The code in this method is relatively straightforward. First it tests to confirm that the incoming parameter points to a node of the correct type, and throws an exception if the incoming parameter is not of the correct type.

If the incoming parameter is of the correct type, the code in the method invokes a method named applyTemplates passing the node as a parameter to that method.

(Note the similarity between the code in Listing 12 and the XSLT template rule in Figure 5.)

The method named applyTemplates

Continuing to follow the execution thread, I will now discuss the method named applyTemplates, shown in Listing 13.

void applyTemplates(Node node,String select){
NodeList children = node.getChildNodes();
if (children != null){
int len = children.getLength();
//Iterate on NodeList of child nodes.
for (int i = 0; i < len; i++){
if((select == null) ||
(select.equals(children.item(i).
getNodeName()))){

//Recursive method call
processNode(children.item(i));

}//end if
}//end for loop
}//end if children != null

}//end applyTemplates

Listing 13

Behavior of the apply-templates rule

The applyTemplates method partially emulates the XSLT apply-templates rule discussed earlier in this lesson, and shown in Figure 6.

<xsl:apply-templates
  optional attribute select="..."
  optional attribute mode="..."
/>

Figure 6

The apply-templates rule has two attributes, select and mode.

(The applyTemplates method shown in Listing 13 does not support the mode attribute. Perhaps I will update the method in a future lesson to support this attribute.)

As I explained earlier in this lesson,

"The xsl:apply-templates rule processes all child nodes of the context node that match an optional select attribute. If the select attribute is omitted, all child nodes are processed."

Behavior of the method named applyTemplates

The applyTemplates method shown in Listing 13 receives two incoming parameters:

The context node.
The select parameter.

If the select parameter is null, the method examines and processes all child nodes of the context node. Otherwise, it processes only those child nodes that match the select parameter.

The code in Listing 13 invokes the getChildNodes method on the context node to get a list of all child nodes of the context node. If there are no child nodes, it quietly returns.

A recursive method call

If there are child nodes, the method uses a for loop to process all child nodes that match the select parameter as described above.

(Note that the match or lack thereof is based on the name of the node obtained by invoking the method named getNodeName on the child node being examined.)

For each matching child node, the applyTemplates method makes a recursive call to the method named processNode, passing the child node's reference as a parameter to the processNode method.

Return to defElOrRtNodeTemp method

Eventually, the recursive process will end, and control will return to the defElOrRtNodeTemp method shown in Listing 12. From there, control will return to either the DOCUMENT_NODE case or the ELEMENT_NODE case in the switch statement in Listing 10 or Listing 11 from which the defElOrRtNodeTemp method was called.

That, in turn, brings us back to a discussion of the other cases in the switch statement.

The TEXT_NODE and ATTRIBUTE_NODE cases

The next two cases from the switch statement that I will discuss are shown in Listing 14. (The switch statement began in Listing 10)

Listing 14 shows the cases for text nodes and attribute nodes. I have grouped these two cases together because the default behavior of both cases is to invoke the method named defTextOrAttrTemp, and to send the String returned by that method to the output.

case Node.TEXT_NODE:{
if(false){
//unreachable in this program
}else{//invoke default behavior
out.print(defTextOrAttrTemp(node));
}//end else
break;
}//end case Node.TEXT_NODE

case Node.ATTRIBUTE_NODE:{
if(false){
//unreachable in this program
}else{//invoke default behavior
out.print(defTextOrAttrTemp(node));
}//end else
break;
}//end case Node.ATTRIBUTE_NODE

Listing 14

The defTextOrAttrTemp method

Once again, following the execution thread, I will now discuss the method named defTextOrAttrTemp method. This method is called whenever:

The processNode method is called with a reference to either a text node or an attribute node, and.
The default behavior for the node type is executed.

Listing 15 shows the entire method named defTextOrAttrTemp.

String defTextOrAttrTemp(Node node)
throws Exception{
int nodeType = node.getNodeType();
if((nodeType == Node.ATTRIBUTE_NODE)
|| (nodeType == Node.TEXT_NODE)){

//Get and return value of context node.
return valueOf(node,".");
}else{
throw new Exception(
"Bad call to defaultTextOrAttr method");
}//end else
}//end defaultTextOrAttr

Listing 15

Emulates a built-in XSLT template rule

This method emulates the built-in XSLT template rule shown in Listing 2 and repeated in Figure 7 below for convenient viewing.

  <xsl:template match="text()|@*">
    <xsl:value-of select="."/>
  </xsl:template>

Figure 7

As I told you earlier, this template rule matches all text nodes and all attribute nodes. Therefore, the defTextOrAttrTemp method is invoked by the default behavior of either the TEXT_NODE case or the ATTRIBUTE_NODE case in the switch statement in Listing 14.

Similar behavior

Once again, note the similarity between the method named defTextOrAttrTemp in Listing 15 and the template rule shown in Figure 7.

In Figure 7, the template rule executes the xsl:value-of XSLT element to send the value of the context node to the output.

The method shown in Listing 15 invokes a method named valueOf, passing "." as a parameter (note the period between the quotation marks). The value returned by that method is sent to the output by the code in the default behaviors of the two cases in Listing 14.

The method named valueOf

The method named valueOf, which begins in Listing 16, is fairly complex. I will discuss portions of this method in this lesson and will discuss the remainder of the method in subsequent lessons.

This method emulates an <xsl:value-of select="???"/> XSLT element.

Three forms of method call

The method requires two parameters. The first parameter is of type Node, and is the context node. The second parameter is of type String and is a select parameter.

The valueOf method recognizes three forms of call:

valueOf(Node theNode,String "@attrName")
valueOf(Node theNode,String ".")
valueOf(Node theNode,String "nodeName")

In the first form, the method returns the text value of the named attribute of theNode. An attribute is specified by a select value that begins with @. If the attribute doesn't exist, the method returns an empty string.

In the second form, which is the only form actually used in this program, the value of the select parameter is a String containing a single period. In this form, the method returns the concatenated text values of the context node and all descendants of the context node (including text nodes that are children of the context node).

In the third form, the method returns the concatenated text values of all descendants of a specified child node of the context node. If the context node has more than one child node with the specified name, only the first one found is processed. The others are ignored.

Features not supported

The valueOf method does not support the following features, which are standard features of the xsl:value-of XSLT element:

disable-output-escaping
processing instruction nodes
comment nodes
namespace nodes

Will discuss the second form only

Since the second form of call listed above is the only form actually used in this program, I will discuss only those portions of the method that support that form. I will defer discussion of the other portions of the method until they are used in subsequent lessons.

Process the context node

The code in Listing 16 picks up at the point where it is determined that the incoming value for select is a String object's reference with a value of "." (note the period between the quotation marks). This is a request to return the value of the context node.

This method supports two possibilities for the context node:

Element node - return the concatenated text values of all descendant nodes of the context node.
Text node - return the text value of the text node.

Clearly the first possibility is the more complex of the two, but as you will see, recursion makes it easy to accomplish.

When the context node is an element node ...

The code in Listing 16 shows the beginning of the code required to process the context node as an element node.

public String valueOf(Node node,String select){

//code deleted for brevity

else if(select != null
&& select.equals(".")){

int nodeType = node.getNodeType();
if(nodeType == Node.ELEMENT_NODE){
//Process the context node as an element
// node.

Listing 16

Get list of child nodes

In preparation for processing all descendant nodes of the context node, the code in Listing 17 gets a list of child nodes, along with the length of the list.

In addition, the code in Listing 17 initializes a String variable named nodeTextValue that will be used to collect the concatenated text values of the descendant nodes. Note that this variable is initialized to contain an empty string.

NodeList childNodes =
node.getChildNodes();
int listLen = childNodes.getLength();

String nodeTextValue = "";//result

Listing 17

Process child nodes of context node

Having gotten a list of child nodes of the context node, all that is required to accomplish the objective is to make a series of recursive calls to the valueOf method, passing each child node in turn to the valueOf method as shown in Listing 18.

for(int j = 0; j < listLen; j++){
nodeTextValue +=
valueOf(childNodes.item(j),".");
}//end for loop

return nodeTextValue;

Listing 18

Each child node becomes the new context node upon re-entry into the valueOf method, and each call requests the value of the context node (the current child node) by passing "." for the select parameter.

Concatenation

The code in Listing 18 also deals with concatenation. The value returned from each call to the valueOf method is concatenated with the text value already stored in the variable named nodeTextValue.

Finally, after all child nodes have been processed, the code in Listing 18 returns the concatenated value stored in the variable named nodeTextValue.

When the context node is a text node ...

If you understood all of the above, (including the recursion), you should find it easy to understand the code shown in Listing 19. Listing 19 shows the case where the context node is a text node.

}else if(nodeType == Node.TEXT_NODE){
return node.getNodeValue();

Listing 19

In this case, the method simply returns the value obtained by invoking getNodeValue on the text node.

One other possibility

There is one other possibility that is handled by the code in Listing 20. That possibility is that the context node is neither a text node nor an element node. In that case, the valueOf method returns an empty string.

}else{
//ignore all other context node types
}//end else
}//end if for context node

//code deleted for brevity

return "";//empty string
}//end method valueOf

Listing 20

Other types of nodes in the switch statement

Returning to the switch statement that began in Listing 10, we find two additional cases, each of which invokes the same method by default:

COMMENT_NODE
PROCESSING_INSTRUCTION_NODE

The default behavior of the cases corresponding to both of these node types is to invoke the method named defComOrProcInstrTemp.

case Node.COMMENT_NODE:{
if(false){
//unreachable in this program
}else{//invoke default behavior
defComOrProcInstrTemp(node);
}//end else
break;
}//end case COMMENT_NODE

case Node.PROCESSING_INSTRUCTION_NODE:{
if(false){
//unreachable in this program
}else{//invoke default behavior
//First save proc instr for later
// use.
procInstr.add(node);
//Now invoke default behavior.
defComOrProcInstrTemp(node);
}//end else
break;
}//end case PROCESSING_INSTRUCTION_NODE

Listing 21

Save all processing instructions

I will discuss the defComOrProcInstrTemp method shortly. First, however, I will explain the extra code that appears in the default portion of the processing instruction node case in Listing 21.

The purpose of a processing instruction in an XML file is to provide instructions to processing programs such as this one. The XML file shown in Listing 29 contains the three processing instructions shown in Listing 22.

<?dummy-target dummy-data="def"?>
<?xml-stylesheet
type="text/xsl" href="Dom11.xsl"?>
<?false-target false-data="ghi"?>

Listing 22

Stylesheet identified in a processing instruction

The first and third of the three processing instructions are dummy processing instructions put there to test the capabilities of this program. However, the processing instruction in the middle is a real processing instruction that specifies the name of the file containing a stylesheet. That stylesheet will be used later when this program causes an XSLT transformation to take place using the XML file in Listing 29, and the stylesheet file identified in Listing 22. (That stylesheet actually appears in Listing 30.)

In order to use that processing instruction to identify the stylesheet file, this program must capture the processing instruction and extract the file name from the processing instruction. A statement in the second case in Listing 21 causes references to all processing instruction nodes to be added to and saved in static variable of the Dom11 class named procInstr.

That information will be used later to extract the name of the stylesheet file from the processing instruction.

The defComOrProcInstrTemp method

Both of the switch cases shown in Listing 21 invoke this method as their default behavior. A complete listing of the defComOrProcInstrTemp method is shown in Listing 23.

String defComOrProcInstrTemp(Node node)
throws Exception{
int nodeType = node.getNodeType();
if((nodeType == Node.COMMENT_NODE) ||
(nodeType ==
Node.PROCESSING_INSTRUCTION_NODE)){

return "";//empty string
}else{
throw new Exception("Bad call to " +
"defalutCommentOrProcInstrTemplate");
}//end else
}//end defComOrProcInstrTemp

Listing 23

The defComOrProcInstrTemp method emulates the built-in template rule shown in Figure 8.

<xsl:template
  match="processing-instruction()|comment()"

Figure 8

According to Nutshell, the built-in template rule for comments and processing instructions doesn't output anything into the output tree. Therefore, the defComOrProcInstrTemp method shown in Listing 23 simply returns an empty string.

The namespace node case

The default case for the switch statement begun in Listing 10 is shown in Listing 24.

default:{
//Ignore all other node types.
}//end default

}//end switch

Listing 24

Since the switch statement contains explicit cases for six of the seven possible types of nodes in a Dom tree, the default case will be activated only in the case of namespace nodes. As I mentioned earlier, the Node interface doesn't provide a constant that can be used to identify namespace nodes, so it isn't possible to create an explicit case for namespace nodes.

Also, here is what Nutshell has to say about the built-in template rule for namespace nodes:

"A ... template rule ... instructs the processor not to copy any part of the namespace node to the output."

Therefore, the default case in Listing 24, which catches all namespace nodes, doesn't send anything to the output.

End of the processNode method

I have discussed everything of significance in the processNode method. Continuing to follow the execution thread, I will now turn my attention back to the main method.

Perform an XSLT transformation

After the code has been executed to process the document using program code (beginning with the invocation of the processDocumentNode method in Listing 7), the statement in Listing 25 invokes the doXslTransform method to cause the XML document to be transformed using the stylesheet identified in one of the processing instructions in the XML file.

thisObj.doXslTransform(
document,argv[1],procInstr);

Listing 25

Stylesheet reference has been saved

The success of the method call in Listing 25 depends on the stylesheet processing instruction having been saved while the document was being processed. Otherwise, it would be necessary to add code in this method to search the DOM tree for the stylesheet processing instruction.

All processing instructions are saved in a Vector object by this program. The Vector object's reference is passed as the third parameter to this method. The first parameter is a reference to the Document or root node in the DOM tree. The second parameter is the name of the output file.

The doXslTransform method

The doXslTransform method begins in Listing 26. This method uses an XSLT stylesheet file to transform an incoming Document object into an output file. A large portion of the code in this method is dedicated to:

Identifying the processing instruction containing the stylesheet information.
Extracting the stylesheet information from the processing instruction.

Identify the processing instruction containing the stylesheet reference

The code in Listing 26 searches the Vector object seeking a processing instruction node that contains a stylesheet reference.

void doXslTransform(Document document,
String outFile,
Vector procInstr)
throws Exception{
try{
//Get stylesheet ID from proc instr.
ProcessingInstruction pi = null;
boolean piFlag = false;
int size = procInstr.size();

//Search for a stylesheet in the Vector
// containing processing instruction nodes.
for(int i = 0; i < size; i++){
pi = (ProcessingInstruction)procInstr.
get(i);
if(pi.getTarget().startsWith(
"xml-stylesheet") && pi.getData().
startsWith("type=\"text/xsl\"")){
//Looks like a good stylesheet.
piFlag = true;
break;
}//end if
}//end for loop
if(piFlag == false){//still false?
throw new Exception(
"No valid stylesheet");
}//end if

Listing 26

How does this work?

To see how this code works, first take a look at the processing instruction in the XML file that contains the stylesheet reference. This processing instruction was shown in Listing 22, and is repeated below in Figure 9 for convenient viewing.

<?xml-stylesheet 
  type="text/xsl" href="Dom11.xsl"?>

Figure 9

The purpose of a processing instruction is to provide information to processing programs that will be used to process the XML file.

Format of a processing instruction

According to Nutshell,

"A processing instruction begins with <? and ends with ?>. Immediately following the <? is an XML name called the target, possibly the name of the application for which this processing instruction is intended or possibly just an identifier for this particular processing instruction. The rest of the processing instruction contains text in a format appropriate for the application for which the instruction is intended."

Applying this knowledge to the stylesheet processing instruction in Figure 9, you can see that the target consists of the following text: xml-stylesheet.

Accessing the target and the data

The target of a processing instruction node can be accessed in Java by invoking the getTarget method on the processing instruction node's reference.

The remainder of the text in the processing instruction can be accessed by invoking the getData method on the same reference.

The code in Listing 26 examines each of the objects in the Vector, invoking getTarget and getData, searching for a processing instruction whose target and data match that which is known to be true for a stylesheet. When a match is found, the code breaks out of the for loop.

If no match is found, the code in Listing 26 throws an exception.

Extract the stylesheet file name
Having identified the processing instruction that contains the stylesheet reference, the code in Listing 27 uses the getData method of the ProcessingInstruction interface, along with some methods of the String class to extract the name of the file containing the stylesheet.

String xslFile = pi.getData().
substring(pi.getData().indexOf(
"href=")+6);
//Eliminate the quotation mark at the end
xslFile = xslFile.substring(
0,xslFile.length()-1);

Listing 27

The ability to extract the file name is based on the known format of the stylesheet processing instruction.

Do the XSLT transformation

The remaining code in the doXslTransform method is shown in Listing 28.

//Get a TransformerFactory object
TransformerFactory xformFactory =
TransformerFactory.newInstance();

//Get an XSL Transformer object based on
// the XSL file discovered above.
Transformer transformer =
xformFactory.newTransformer(
new StreamSource(
new File(xslFile)));

//Get a DOMSource object that represents
// the DOM tree.
DOMSource source = new DOMSource(document);

//Get an output stream for the output
// file.
PrintWriter xformStream = new PrintWriter(
new FileOutputStream(outFile));

//Get a StreamResult object that points to
// the output file. Then transform the DOM
// sending text to the output file.
StreamResult xformResult =
new StreamResult(xformStream);

//Do the transform
transformer.transform(source,xformResult);
}catch(Exception e){
e.printStackTrace(System.err);
}//end catch

}//end doXslTransform

Listing 28

You have seen this code before

The code in Listing 28 is not new to this series of lessons. This code was discussed in detail in the earlier lesson entitled Getting Started with Java JAXP and XSL Transformations (XSLT). Therefore, other than to point out one difference relative to the previous code, and to review the steps involved, I won't discuss the code in Listing 28 further in this lesson.

Steps for creating a Transformer object

The following two steps are required to create a Transformer object. Once a Transformer object is available, it can be used to transform one DOM tree into another DOM tree.

Create a TransformerFactory object by invoking the static newInstance method of the TransformerFactory class.
Invoke the newTransformer method on the TransformerFactory object.

One important difference

There is one important difference between the code in Listing 28 and the code in the earlier lesson. The two programs invoke different overloaded versions of the newTransformer method of the TransformerFactory class.

The earlier lesson entitled Getting Started with Java JAXP and XSL Transformations (XSLT) invoked a version that took no parameters and returned a Transformer object that simply copies a source tree to a result tree.

The code in Listing 28 invokes a version of the newTransformer method that takes the stylesheet file as an input parameter and returns a Transformer object that uses the stylesheet file to perform an XSLT transformation.

That concludes the discussion of the program named Dom11.

Run the Program

I encourage you to copy the Java code, XML files, and XSL files from the listings near the end of this lesson. Compile and execute the programs. Experiment with them, making changes, and observing the results of your changes.

Summary

I explained default XSLT behavior and showed you how to write Java code that mimics that behavior. The resulting Java code serves as a skeleton for more advanced transformation programs.

What's Next?

In the next lesson, I will show you how to write a Java program that mimics an XSLT transformation for converting an XML file into a text file. I will also show that once you have a library of Java methods that emulate XSLT elements, it is no more difficult to write a Java program to transform an XML document than it is to write an XSL stylesheet to transform the same document.

Complete Program Listings

Complete listings of the various files discussed in this lesson are contained in the listings that follow.

<?xml version="1.0"?>

<!DOCTYPE top [
<!ELEMENT top (theData)*>
<!ELEMENT theData (title,author,price)*>
<!ELEMENT title (#PCDATA | subtitle)*>
<!ELEMENT author (#PCDATA)>
<!ELEMENT price (#PCDATA)>
<!ELEMENT subtitle (#PCDATA)>
<!ATTLIST theData attr CDATA #IMPLIED>
<!ATTLIST subtitle position CDATA #IMPLIED>
]>




<?dummy-target dummy-data="def"?>
<?xml-stylesheet
type="text/xsl" href="Dom11.xsl"?>
<?false-target false-data="ghi"?>

<top>

<theData attr="Dummy Attr Value">
<title>Java
<subtitle position="Low">really</subtitle>rules
</title>
<author>R.Baldwin</author>
<price>$9.95</price>
</theData>

<theData>
<title>Python</title>
<author>R.Baldwin</author>
<price>$15.42</price>
</theData>

<theData>
<title>XML</title>
<author>R.Baldwin</author>
<price>$19.60</price>
</theData>

</top>

Listing 29

<?xml version='1.0'?>

<xsl:stylesheet
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
version="1.0">


<xsl:template match="*|/">
<xsl:apply-templates/>
</xsl:template>


<xsl:template match="text()|@*">
<xsl:value-of select="."/>
</xsl:template>

</xsl:stylesheet>

Listing 30

/*File Dom11.java
Copyright 2003 R.G.Baldwin

This program implements all six built-in default
template rules for an XML processor. In
addition, it implements a couple of other
template rules that are required to support
the built-in rules, such as xsl:value-of.

As such, the program serves as the skeleton for
the definition of custom template rules.

To create a custom temtlate rule:
1. Go to the processNode method.
2. Identify the node type.
3. Change the conditional clause in the if
statement to implement the match.
4. Write code in the body of the if statement to
implement the custom rule.

If the modified conditional clause evaluates to
true, the custom rule will be executed. If
false,the default rule will be executed.

As written, this program extracts and
concatenates all text values from a specified
XML file, and writes that text into a result
file, using two different approaches:

1. An XSLT style sheet and transformation.
2. Program code that emulates the behavior of the
XSL transformation.

In particular, this program illustrates Java code
that emulates the XSLT templates in the files
named Dom11.xsl and Dom11.xsl. These two XSL
files differ in terms of their dependence on the
built-in templates.

Dom11.xsl explicitly includes template rules that
replicate the built-in rules for text, nodes, and
documents.

Dom11.xsl doesn't explicitly include any
template rules, but depends entirely on built-in
rules for proper operation.

Both XSL files produce the same result when
processed against the XML files named Doc11.xml
and Dom11.xml, demonstrating the behavior of
the built-in template rules.

The execution of these template rules causes the
explicit template rules, or the built-in template
rules to be executed on every node, thereby
causing the contents of every text node to be
concatenated and written into the result file.

The program requires three command line
parameters in the following order:
1, The name of the input XML file - must be
Dom11.xml or Dom11.xml.
2. The name of the output file to be
produced by the XSL transformation.
3. The name of the output file to be
produced by the program code that emulates
the XSL transformation.

The name of the XSL stylesheet file is extracted
from the processing instruction in the XML file.

The program begins by executing code to transform
the incoming XML file in a way that mimics the
XSL Transformation. Along the way, it saves the
processing instructions containing the ID of the
stylesheet file for use by the XSLT process
later. Otherwise, the code that performs the
XSL transformation later would have to search the
DOM tree for the XSL stylesheet file.

Then the program uses the XSLT style sheet to
transform the XML file into a result file.

No effort was made to provide meaningful
information about errors and exceptions.

Tested with SDK 1.4.2 under WinXP.
************************************************/

import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.DocumentBuilder;

import org.w3c.dom.*;

import javax.xml.transform.Transformer;
import javax.xml.transform.TransformerFactory;
import javax.xml.transform.dom.DOMSource;
import javax.xml.transform.stream.*;

import java.util.*;
import java.io.*;

public class Dom11{

PrintWriter out;//output stream
//Save processing instruction nodes here
static Vector procInstr = new Vector();

public static void main(String argv[]){
if (argv.length != 3){
System.err.println(
"usage: java Dom11 "
+ "xmlFileIn "
+ "xformFileOut "
+ "codeFileOut");
System.exit(0);
}//end if

try{
//Get a factory object for DocumentBuilder
// objects
///
DocumentBuilderFactory factory =
DocumentBuilderFactory.newInstance();

//Configure the factory object. Change
// the following parameter to false for a
// non-validating parser.
///
factory.setValidating(true);
factory.setNamespaceAware(false);
//The following statement causes the parser
// to ignore cosmetic whitespace between
// elements.
///
factory.
setIgnoringElementContentWhitespace(true);

//Get a DocumentBuilder (parser) object
///
DocumentBuilder builder =
factory.newDocumentBuilder();

//Parse the XML input file to create a
// Document object that represents the
// input XML file.
///
Document document = builder.parse(
new File(argv[0]));

//Instantiate an object of this class
///
Dom11 thisObj = new Dom11();

//TRANSFORMATION THROUGH PROGRAM CODE
//Use program code to transform the
// DOM tree into an output file.
//
//Get an output stream for the output
// produced by the program code. This
// stream object is used by several
// methods, so it was instantiated at this
// point and saved as an instance variable
// of the object.
///
thisObj.out = new PrintWriter(
new FileOutputStream(argv[2]));

//Process the DOM tree, beginning with the
// Document node to produce the output.
// Invocation of processDocumentNode starts
// a recursive process that processes the
// entire DOM tree.
///
thisObj.processDocumentNode(document);

//XSLT TRANSFORMATION
//Use XSLT to transform the DOM tree into
// an output file. Note that the success
// of this method call depends on the
// stylesheet processing instruction having
// been saved while the transformation was
// being performed using program code
// above. Otherwise, it would be necessary
// to include the code in this method to
// search the DOM tree for the stylesheet
// processing instruction. All processing
// instructions are saved in a Vector
// object, which is passed as the third
// parameter to this method.
///
thisObj.doXslTransform(
document,argv[1],procInstr);

}catch(Exception e){
//Note that no effort was made to provide
// meaningful results in the event of an
// exception or error.
///
e.printStackTrace(System.err);
}//end catch
}// end main()
//-------------------------------------------//

//This method is used to produce any text
// required in the output at the document
// level, such as the XML declaration for an
// XML document.
///
void processDocumentNode(Node node){
//Write one line of text into the output.
///
out.println("<?xml version=\"1.0\" "
+ "encoding=\"UTF-8\"?>");

//Go process the root (document) node. This
// method call triggers a recursive process
// that processes the entire DOM tree.
///
processNode(node);

out.flush();
}//end processDocumentNode
//-------------------------------------------//

//There are seven kinds of nodes:
// root or document
// element
// attribute
// text
// comment
// processing instruction
// namespace
//
//This method handles the first six.
// Apparently it is not possible to handle
// namespace nodes in Java because there is
// no constant in the Node class to identify
// namespace nodes
///
void processNode(Node node){

try{
if (node == null){
System.err.println(
"Nothing to do, node is null");
return;
}//end if

//Process the incoming node based on its
// type.
///
int type = node.getNodeType();

//To define an overriding template rule,
// insert the matching condition in the
// conditional clause of the if statement,
// and provide code to implement the rule
// in the body of the if statement. If the
// conditional clause evaluates to true,
// the default rule for that element type
// will not be processed.
///
switch (type){
case Node.TEXT_NODE:{
if(false){
//Change conditional and write
// overriding handler here
///
}else{//invoke default behavior
out.print(defTextOrAttrTemp(node));
}//end else
break;
}//end case Node.TEXT_NODE

case Node.ATTRIBUTE_NODE:{
if(false){
//Change conditional and write
// overriding handler here
///
}else{//invoke default behavior
out.print(defTextOrAttrTemp(node));
}//end else
break;
}//end case Node.ATTRIBUTE_NODE

case Node.ELEMENT_NODE:{
if(false){
//Change conditional and write
// overriding handler here
///
}else{//invoke default behavior
defElOrRtNodeTemp(node);
}//end else
break;
}//end case ELEMENT_NODE

case Node.DOCUMENT_NODE:{
if(false){
//Change conditional and write
// overriding handler here
///
}else{//invoke default behavior
defElOrRtNodeTemp(node);
}//end else
break;
}//end case DOCUMENT_NODE

case Node.COMMENT_NODE:{
if(false){
//Change conditional and write
// overriding handler here
///
}else{//invoke default behavior
defComOrProcInstrTemp(node);
}//end else
break;
}//end case COMMENT_NODE

case Node.PROCESSING_INSTRUCTION_NODE:{
if(false){
//Change conditional and write
// overriding handler here
///
//Save proc instr for later use
procInstr.add(node);
}else{//invoke default behavior
//First save proc instr for later
// use.
///
procInstr.add(node);
//Now invoke default behavior.
///
defComOrProcInstrTemp(node);
}//end else
break;
}//end case PROCESSING_INSTRUCTION_NODE

default:{
//Ignore all other node types.
}//end default

}//end switch

}catch(Exception e){
e.printStackTrace(System.err);
}//end catch
}//end processNode(Node)
//-------------------------------------------//

//This method emulates the following default
// template rule:
// <xsl:template match="text()|@*">
// <xsl:value-of select="."/>
// </xsl:template>
///
String defTextOrAttrTemp(Node node)
throws Exception{
int nodeType = node.getNodeType();
if((nodeType == Node.ATTRIBUTE_NODE)
|| (nodeType == Node.TEXT_NODE)){
//Get and return the value of the context
// node.
///
return valueOf(node,".");
}else{
throw new Exception(
"Bad call to defaultTextOrAttr method");
}//end else
}//end defaultTextOrAttr
//-------------------------------------------//

//This method emulates the following default
// template rule:
// <xsl:template match="*|/">
// <xsl:apply-templates/>
// </xsl:template>
///
void defElOrRtNodeTemp(Node node)
throws Exception{
int nodeType = node.getNodeType();
if((nodeType == Node.ELEMENT_NODE) ||
(nodeType == Node.DOCUMENT_NODE)){
//Note that the following is a recursive
// method call.
///
applyTemplates(node,null);
}else{
throw new Exception(
"Bad call to defElOrRtNodeTemp");
}//end else
}//end defElOrRtNodeTemp
//-------------------------------------------//

//This method emulates the following default
// template rule:
// <xsl:template
// match="processing-instruction()|comment()"
///
String defComOrProcInstrTemp(Node node)
throws Exception{
int nodeType = node.getNodeType();
if((nodeType == Node.COMMENT_NODE) ||
(nodeType ==
Node.PROCESSING_INSTRUCTION_NODE)){
//According to page Nutshell pg 148, the
// default rule for comments and processing
// instructions doesn't output anything
// into the result tree.
///
return "";//empty string
}else{
throw new Exception("Bad call to " +
"defalutCommentOrProcInstrTemplate");
}//end else
}//end defComOrProcInstrTemp
//-------------------------------------------//

//See Nutshell, pg 148 for an explanation as to
// why it is not possible to write a Java
// method that emulates the default namespace
// template.
///
void defaultNamespaceTemplate(Node node)
throws Exception{
throw new Exception("See Nutshell pg 148" +
"regarding default behavior for " +
"namespace template.");
}//end defaultNamespaceTemplate
//-------------------------------------------//

//Simulates an XSLT apply-templates rule.
// <xsl:apply-templates
// optional select = "..."
// optional mode = "..."
// >
//Note that the mode attribute is not supported
// in this version.
//If the select parameter is null, all child
// nodes are processed.
void applyTemplates(Node node,String select){
NodeList children = node.getChildNodes();
if (children != null){
int len = children.getLength();
//Iterate on NodeList of child nodes.
for (int i = 0; i < len; i++){
if((select == null) ||
(select.equals(children.item(i).
getNodeName()))){
//Note that the following is a
// recursive method call.
///
processNode(children.item(i));
}//end if
}//end for loop
}//end if children != null

}//end applyTemplates
//-------------------------------------------//

//This method simulates an XSLT
// <xsl:value-of select="???"/>
// The general form of the method call is
// valueOf(Node theNode,String select)
//
//The method recognizes three forms of call:
// valueOf(Node theNode,String "@attrName")
// valueOf(Node theNode,String ".")
// valueOf(Node theNode,String "nodeName")
//
//In the first form, the method returns the
// text value of the named attribute of
// theNode. An attribute is specified by a
// select value that begins with @. If the
// attribute doesn't exist, the method returns
// an empty string.
//
//In the second form, the method returns the
// concatenated text values of descendants of
// the context node.
//
//In the third form, the method returns the
// concatenated text values of all descendants
// of a specified child node of the context
// node. If the context node has more than one
// child node with the specified name, only the
// first one found is processed. The others
// are ignored.
//
//The method does not support the following,
// which are standard features of xsl:value-of:
// disable-output-escaping
// processing instruction nodes
// comment nodes
// namespace nodes
///

public String valueOf(Node node,String select){

if(select != null
&& select.charAt(0) == '@'){
//This is a request for the value of an
// attribute. Returns empty string if the
// attribute doesn't exist on the element.
String attrName = select.substring(1);
NamedNodeMap attrList =
node.getAttributes();
Node attrNode = attrList.getNamedItem(
attrName);
if(attrNode != null){
return attrNode.getNodeValue();
}else{
return "";//empty string
}//end else
}//end if on @

else if(select != null
&& select.equals(".")){
//This is a request to process the context
// node
int nodeType = node.getNodeType();
if(nodeType == Node.ELEMENT_NODE){
//Process the context node as an element
// node. Return the concatenated text
// values of all descendants of the
// context node.
NodeList childNodes =
node.getChildNodes();
int listLen = childNodes.getLength();
String nodeTextValue = "";//result

for(int j = 0; j < listLen; j++){
nodeTextValue +=
valueOf(childNodes.item(j),".");
}//end for loop
return nodeTextValue;
}else if(nodeType == Node.TEXT_NODE){
//Process the context node as a text
// node. Simply get and return its
// value.
return node.getNodeValue();
}else{
//ignore all other context node types
}//end else
}//end if for context node

else if(select != null){
//Process a child node whose name is
// specified by the value of the incoming
// parameter named select. Get and return
// the concatenated text values of all
// descendants of the specified child node.
//This process assumes that there is only
// one child node with the specified name
// and processes the first one that it
// finds.
NodeList children = node.getChildNodes();
int len = children.getLength();
for (int i = 0; i < len; i++){
//Trap the specified child node
if(children.item(i).getNodeName().
equals(select)){
//Make a recursive call and let
// existing code do the work.
return valueOf(children.item(i),".");
//The above return statement causes any
// additional child nodes having the
// same name to be ignored.
}//end if getNodeName == select
}//end for loop on all child nodes
}//end else if(select != null)
//Will reach here only if value of select
// is null.
///
return "";//empty string
}//end method valueOf
//-------------------------------------------//

//This method uses an incoming XSLT stylesheet
// file to transform an incoming Document
// object into an output file. Note that the
// successful invocation of this method depends
// on the processing instruction containing the
// stylesheet having been saved in a Vector
// object that is received as an incoming
// parameter. Otherwise, this method would
// have to search the DOM for the stylesheet
// processing instruction.
///
void doXslTransform(Document document,
String outFile,
Vector procInstr)
throws Exception{
try{
//Get stylesheet ID from proc instr.
ProcessingInstruction pi = null;
boolean piFlag = false;
int size = procInstr.size();
//Search for a stylesheet in the Vector
// containing processing instruction nodes.
///
for(int i = 0; i < size; i++){
pi = (ProcessingInstruction)procInstr.
get(i);
if(pi.getTarget().startsWith(
"xml-stylesheet") && pi.getData().
startsWith("type=\"text/xsl\"")){
//Looks like a good stylesheet.
///
piFlag = true;
break;
}//end if
}//end for loop
if(piFlag == false){//still false?
throw new Exception(
"No valid stylesheet");
}//end if
//Get the stylesheet file reference
///
String xslFile = pi.getData().
substring(pi.getData().indexOf(
"href=")+6);
//Eliminate the quotation mark at the end
///
xslFile = xslFile.substring(
0,xslFile.length()-1);

//Get a TransformerFactory object
///
TransformerFactory xformFactory =
TransformerFactory.newInstance();
//Get an XSL Transformer object based on
// the XSL file discovered above.
///
Transformer transformer =
xformFactory.newTransformer(
new StreamSource(
new File(xslFile)));
//Get a DOMSource object that represents
// the DOM tree.
///
DOMSource source = new DOMSource(document);

//Get an output stream for the output
// file.
///
PrintWriter xformStream = new PrintWriter(
new FileOutputStream(outFile));

//Get a StreamResult object that points to
// the output file. Then transform the DOM
// sending text to the output file.
///
StreamResult xformResult =
new StreamResult(xformStream);

//Do the transform
///
transformer.transform(source,xformResult);
}catch(Exception e){
e.printStackTrace(System.err);
}//end catch

}//end doXslTransform

}// class Dom11

Listing 31

<?xml version="1.0"?>

<!DOCTYPE top [
<!ELEMENT top (theData)*>
<!ELEMENT theData (title,author,price)*>
<!ELEMENT title (#PCDATA | subtitle)*>
<!ELEMENT author (#PCDATA)>
<!ELEMENT price (#PCDATA)>
<!ELEMENT subtitle (#PCDATA)>
<!ATTLIST theData attr CDATA #IMPLIED>
<!ATTLIST subtitle position CDATA #IMPLIED>
]>




<?dummy-target dummy-data="def"?>
<?xml-stylesheet
type="text/xsl" href="Dom11a.xsl"?>
<?false-target false-data="ghi"?>

<top>

<theData attr="Dummy Attr Value">
<title>Java
<subtitle position="Low">really</subtitle>rules
</title>
<author>R.Baldwin</author>
<price>$9.95</price>
</theData>

<theData>
<title>Python</title>
<author>R.Baldwin</author>
<price>$15.42</price>
</theData>

<theData>
<title>XML</title>
<author>R.Baldwin</author>
<price>$19.60</price>
</theData>

</top>

Listing 32

<?xml version='1.0'?>

<xsl:stylesheet
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
version="1.0">

</xsl:stylesheet>

Listing 33

About the author

Richard Baldwin is a college professor (at Austin Community College in Austin, TX) and private consultant whose primary focus is a combination of Java, C#, and XML. In addition to the many platform and/or language independent benefits of Java and C# applications, he believes that a combination of Java, C#, and XML will become the primary driving force in the delivery of structured information on the Web.

Richard has participated in numerous consulting projects, and he frequently provides onsite training at the high-tech companies located in and around Austin, Texas. He is the author of Baldwin's Programming Tutorials, which has gained a worldwide following among experienced and aspiring programmers. He has also published articles in JavaPro magazine.

Richard holds an MSEE degree from Southern Methodist University and has many years of experience in the application of computer technology to real-world problems.

Baldwin@DickBaldwin.com

-end-