One of my early Java interests was discovering
ways to process XML files.
I soon realised that, as well as the low-level
XML
APIs with Java bindings (such as DOM and SAX) there were a number of
other approaches to processing XML (such as
JDOM). In general terms
these did more of the work of creating Java classes to represent XML
structures.
I decided to concentrate first on the low-level APIs,
DOM and SAX. I wanted to see what were the minimum steps needed to set
up a working environment to process XML files
using each of these methods.
This page describes my work on SAX; there is a separate DOM page.
An article on
Mapping XML to Java
by Robert Hustead got me started. These are the steps which followed:
-
I made sure PATH is set to include the Java utilities I downloaded in
Sun's JDK:
PATH=/usr/local/jdk1.2.2/bin:$PATH export PATH
This overrides old versions of Java utilities supplied with my Linux
installation.
- I followed the links to David Megginson's SAX pages,
from where I downloaded SAX 2.0
Java package in a file sax2.zip.
This was unpacked using unzip - the file can also be unpacked on
Windows using WinZip.
- As recommended I downloaded the Xerces Java parser
from the
Apache Software
Foundation XML
project site. I chose Java version 1 (to be precise 1.3.1) as version 2
seemed still to be in development. The documentation (see below)
confirmed it supported the SAX 2.0 API.
I then ran into some problems trying to unpack the file downloaded.
Although the link implies the file extension is ".tar.gz", the file
actually downloaded was Xerces-J-bin_1_3_1_tar.tar. The tar
-z flag had to be specified to filter the files through gzip. Using tar
any other way (or attempting to uncompress
first using gunzip) was of no use. Luckily my version of GNU tar then
worked fine:
tar -xvzf Xerces-J-bin_1_3_1_tar.tar
- To read the documentation for Xerces 1.3.1 I used the
following commands:
cd xerces-1_3_1/docs/html netscape `pwd`/index.html
To read the documentation for SAX 2.0 I
used the following commands:
cd sax2/docs netscape `pwd`/sax2.html
In both cases a link then takes you straight to the API documentation.
- I then followed the Quick Start guide in the SAX
documentation.
In order to compile the sample code, its necessary to set CLASSPATH in
the environment to include both the SAX and Xerces Jar files, in my
case:
CLASSPATH=/mnt/DOS_hda1/Linux/sax2/sax2.jar: /mnt/DOS_hda1/Linux/xerces1.3.1/xerces-1_3_1/xerces.jar export CLASSPATH
This can also be done by using the -classpath argument to the Java
compiler:
javac -classpath /mnt/DOS_hda1/Linux/sax2/sax2.jar: \ /mnt/DOS_hda1/Linux/xerces1.3.1/xerces-1_3_1/xerces.jar \ MySAXApp.java
This should create the Java class file MySAXApp.class.
- When running the compiled Java class in the Java
interpreter, you will get a Java runtime error message saying
Exception in thread "main" java.lang.NoClassDefFoundError: MySAXApp
unless you also add "." to your classpath, or else specify this on the
command line. This just makes Java look in the current directory for
the class file compiled in the previous step.
- As mentioned in the Sax documentation, you need to
specify a Java property on the command line to identify the name of the
SAX2 driver class provided by the XML Apache parser. I found this by
hunting through the API documentation:
java -Dorg.xml.sax.driver=org.apache.xerces.parsers.SAXParser \ MySAXApp
- Now I had a working XML parser and sample code to
read XML files and send
to the parser. A few simple tests proved that incorrect XML files
yielded sensible error messages:
java -Dorg.xml.sax.driver=org.apache.xerces.parsers.SAXParser \ MySAXApp file1.xml file2.xml [...]
- Finally I wanted to put a try block around the setup
code, and one around the file reading code, to handle exceptions. I was
not entirely happy with declaring that main "throws Exception".
Surprise, surprise, Java said I had inadvertently declared a variable
in the first try block and attempted to use it in the second block. At
last a block structured language that stops you writing spaghetti. I
got round this by having a high level block to catch SAX exceptions,
and a block around the file reading code just to catch IO and
FileNotFound exceptions.
So no more spaghetti from me.
In making this change I discovered that, since
FileNotFoundException extends IOException, you have to catch it first
or else you have a code block that Java realises is unreachable.
|