Java StAX

Jakob Jenkov
Last update: 2014-05-21

The StAX Java API for XML processing is designed for parsing XML streams, just like the SAX API's. The main differences between the StAX and SAX API's are:

  • StAX is a "pull" API. SAX is a "push" API.
  • StAX can do both XML reading and writing. SAX can only do XML reading.

It is pretty obvious what the difference between a "read + write" capable API vs. a "read" capable API is. But the difference between a "pull" and a "push" style API is less obvious, so I'll talk a little about that. For a more feature-by-feature type comparison of SAX and StAX, see the text SAX vs. StAX.

NOTE: This text uses SVG (Scalable Vector Graphics) diagrams. If you are using Internet Explorer you will need the Adobe SVG Plugin do display these diagrams. Firefox 3.0.5+ users and Google Chrome users should have no problems.

"Pull" vs. "Push" Style API

SAX is a push style API. This means that the SAX parser iterates through the XML and calls methods on the handler object provided by you. For instance, when the SAX parser encounters the beginning of an XML element, it calls the startElement on your handler object. It "pushes" the information from the XML into your object. Hence the name "push" style API. This is also referred to as an "event driven" API. Your handler object is notified with event-calls when something interesting is found in the XML document ("interesting" = elements, texts, comments etc.).

The SAX parser push style parsing is illustrated here:

SAX Parser Your App

StAX is a pull style API. This means that you have to move the StAX parser from item to item in the XML file yourself, just like you do with a standard Iterator or JDBC ResultSet. You can then access the XML information via the StAX parser for each such "item" encountered in the XML file ("item" = elements, texts, comments etc.).

The StAX parser pull style parsing is illustrated here:

Your App StAX Parser

In fact, StAX has two different reader API's. One that looks most like using an Iterator and one that looks most like using a ResultSet. These are called the "iterator" and "cursor" readers.

So, what is the difference between these two readers?

The iterator reader returns an XML event object from it's nextEvent() calls. From this event object you can see what type of event you had encountered (element, text, comment etc.). This event element is immutable, and can be parsed around to other parts of your application. You can also hang on to earlier event objects when iterating to the next event. As you can see, this works very much like how you use an ordinary Iterator when iterating over a collection. Here, you are just iterating over XML events. Here's a sketch:

XMLEventReader reader = ...;

while(reader.hasNext()){
    XMLEvent event = reader.nextEvent();

    if(event.getEventType() == XMLEvent.START_ELEMENT){
        StartElement startElement = event.asStartElement();
        System.out.println(startElement.getName().getLocalPart());
    }
    //... more event types handled here...
}

The cursor reader does not return events from it's next() call. Rather this call moves the cursor to the next "event" in the XML. You can then call methods directly on the cursor to obtain more information about the current event. This is very similar to how you iterate the records of a JDBC ResultSet, and call methods like getString() or getLong() to get values from the current record pointed to by the ResultSet. Here is a sketch:

XMLStreamReader streamReader = ...;

while(streamReader.hasNext()){
    int eventType = streamReader.next();

    if(eventType == XMLStreamReader.START_ELEMENT){
        System.out.println(streamReader.getLocalName());
    }

    //... more event types handled here...
}

So, one of the main differences is, that you can hang on to earlier XML event objects when using the iterator style API. You cannot do this when using the cursor style API. Once you move the cursor to the next event in the XML stream, you have no information about the previous event. This speaks in favour of using the iterator style API.

However, the cursor style API is said to be more memory-efficient than the iterator style API. So, if your application needs absolute top-performance, use the cursor style API.

Both of these two StAX API's will be covered in more detail in later texts. See the table of contents in the right side of this page.

Java StAX Implementation

At the time of writing (Java 6) only the StAX interfaces are bundled with the JDK. There is no StAX implementation built into Java. But, there is a standard implementation which can be found here:

http://stax.codehaus.org/

Jakob Jenkov

Featured Videos

Java Generics

Java ForkJoinPool

P2P Networks Introduction



















Close TOC
All Tutorial Trails
All Trails
Table of contents (TOC) for this tutorial trail
Trail TOC
Table of contents (TOC) for this tutorial
Page TOC
Previous tutorial in this tutorial trail
Previous
Next tutorial in this tutorial trail
Next