Java SAX vs. StAX
Jakob Jenkov |
Both SAX and StAX are stream / event oriented XML parsers, but there is a subtle difference in how they work. SAX uses a "push" model, and StAX uses a "pull" model. To the unknowing this can be confusing. Therefore I will try to address the differences in these models in a little more detail in this text.
Should you know of any advantages or disadvantages that I have forgotten here, please feel free to send me an email. You can find a working email address on my About page.
The SAX Push Model
The SAX push model means that it is the SAX parser that calls your handler, not your handler that calls the SAX parser. The SAX parser thus "pushes" events into your handler. Here it is, summarized:
SAX Parser --> Handler
With a push model you have not control over how and when the parser iterates over the file. Once you start the parser, it iterates all the way until the end, calling your handler for each and every XML event in the input XML document.
The StAX Pull Model
The StAX pull model means that it is your "handler" class that calls the parser, not the other way around. Thus your handler class controls when the parser is to move on to the next event in the input. In other words, your handler "pulls" the XML events out of the parser. Additionally, you can stop the parsing at any point. The pull model is summarized like this:
Handler --> StAX Parser
Summary of Advantages and Disadvantages
The StAX pull model has a few advantages over the SAX push model (one of the few cases where "inversion of control" is not an advantage). I have summarized the positives and negatives of SAX and StAX in the table below:
SAX + | SAX - | StAX + | StAX - |
+ Schema Validation | + Subparsing / Delegation possible + Support for XML Writing |
- No Schema Validation |
StAX Allows Subparsing / Delegation
One big advantage of StAX over SAX is that the pull model allows subparsing of the XML input by methods and components. What do I mean by that?
First, here is an XML example:
<transportInfo> <driver>...</driver> <driver>...</driver> <vehicle>...</vehicle> <vehicle>...</vehicle> </transportInfo>
Second, look at this StAX StreamReader example:
public void parse(){ XMLStreamReader streamReader = factory.createXMLStreamReader( new FileReader("data\\test.xml")); while(streamReader.hasNext()){ streamReader.next(); if(streamReader.getEventType() == XMLStreamReader.START_ELEMENT){ String elementName = streamReader.getLocalName(); if("driver".equals(elementName)){ parseDriverAndAllChildren(streamReader); } else if("vehicle".equals(elementName)) { parseVehicleAndAllChildren(streamReader); } } } } public void parseDriverAndAllChildren(XMLStreamReader streamReader) { while(streamReader.hasNext()){ streamReader.next(); if(streamReader.getEventType() == XMLStreamReader.END_ELEMENT){ String elementName = streamReader.getLocalName(); if("driver".equals(elementName)){ return; } } else if(streamReader.getEventType() == XMLStreamReader.START_ELEMENT){ //do something to child elements... } } } public void parseVehicleAndAllChildren(XMLStreamReader streamReader) { while(streamReader.hasNext()){ streamReader.next(); if(streamReader.getEventType() == XMLStreamReader.END_ELEMENT){ String elementName = streamReader.getLocalName(); if("vehicle".equals(elementName)){ return; } } else if(streamReader.getEventType() == XMLStreamReader.START_ELEMENT){ //do something to child elements... } } }
Notice how each of the methods parseDriverAndAllChildren()
and parseVehicleAndAllChildren()
are capable of continuing the parsing loop (while(streamReader.hasNext() {... }
and process
all elements related to the "driver" / "vehicle" element of their respective interest.
If you were to do this using a SAX handler, things would have become ugly. You would have had to set a flag inside the handler object to tell what element you were inside. Delegating the parsing and handling of sub-parts of the XML document to a method or component, would not be easily possible. Not as easy as it is shown above.
StAX has Support for XML Writing
SAX does not have support for writing XML. If you do not need to write XML, this is not a problem. If you do need to write XML, this may be annoying. You will have to come up with your own XML writing mechanism. Not that this is hard, or anything. You just have to do it. In StAX such a mechansim is already built in.
StAX has NO Support for Schema Validation
As far as I am aware at the time of writing, StAX has no support for XML Schema validation, where as SAX has that.
Tweet | |
Jakob Jenkov |