Dave Cater

Java - XML parsing using DOM

 
Home Page
Career outline
Java
Java references
Java - XML parsing using SAX
Java - XML parsing using DOM
Java - Servlets
Java - SOAP
Linux
Security
Perl
System management
Testing
Musical notes
By now, you'll probably have come across my notes on processing XML using SAX. If not, do take a look.

This page describes my initial explorations of the second low-level XML API known as DOM (Document Object Model). This takes the approach of converting an XML document to a tree structure. This is clearly useful for allowing random access into modestly sized XML documents, but will be less useful than SAX for large documents, as it reads the entire XML document into memory to create the tree.

Using the methods defined in the Java org.w3c.dom package, I developed a Java program to read an XML file into a DOM structure, and print the DOM in a simple tree format. This involved a recursive function to print the members of a Node, and its children (indented one tab stop). For an Element type node, the tag was extracted using getTagName(), and the list of attributes using getAttributes(). A single call to printtree(document) in main() then printed the entire XML document specified on the command line.

Details of the process:

  • I looked first at the DOM tutorial from Sun Microsystems which is actually part of their JAXP tutorial set.

  • I then downloaded the JAXP software from Sun Microsystems and unpacked the file jaxp-1_1.zip using the unzip command.

  • I made sure PATH is set to include the Java utilities I downloaded in Sun's JDK:
        PATH=/usr/local/jdk1.2.2/bin:$PATH export PATH 
    
    This overrides old versions of Java utilities supplied with my Linux installation.

  • I worked through the JAXP DOM example, to the point where an XML document can be turned into a DOM structure.

    In order to compile the sample code, its necessary to set specify the classpath which was done as follows:

        javac -classpath /mnt/DOS_hda1/Linux/jaxp1.1/jaxp-1.1/jaxp.jar: \
                     /mnt/DOS_hda1/Linux/jaxp1.1/jaxp-1.1/crimson.jar \
    				 DomEcho.java
    
    This should create the Java class file DomEcho.class.

  • I then remembered that the Xerces parser implements the classes necessary to get this far, including the required parts of JAXP API.

    In other words, with no code changes required, I could build the same DomEcho.class file using the command:

        javac -classpath /mnt/DOS_hda1/Linux/xerces1.3.1/xerces-1_3_1/xerces.jar \
            DomEcho.java
    

  • I developed the remainder of the Java program using the Xerces API documentation for reference.