Java DOM: The Document Object
Jakob Jenkov |
The DOM Document
object in the Java DOM API represents an XML document. When you
parse an XML file using a Java DOM parser, you get back a Document
object. In this Java Document tutorial I will give you a head start in traversing a DOM graph.
I cannot cover it all, but that isn't necessary. You just need enough to get
the picture. The rest you can read in the JavaDoc's.
The two most commonly used features of DOM are:
- Accessing Child Elements of an Element
- Accessing Attributes of an Element
It is these two primary features that this text covers.
The Document
interface and all related interfaces are located
in the Java package org.w3c.dom
, because they were designed
by the World Wide Web Consortium (W3C). You need to know this when looking
for the DOM interfaces in the JavaDoc's.
The DOM Document Element
A DOM object contains a lot of different nodes connected in a tree-like structure.
At the top is the Document
object. The Document
object
has a single root element, which is returned by calling getDocumentElement()
like this:
Element rootElement = document.getDocumentElement();
DOM Elements, Child Elements, and the Node Interface
The root element has children which can be elements, comments, processing instructions, characters etc. You get the children of an element like this:
NodeList nodes = element.getChildNodes(); for(int i=0; i<nodes.getLength(); i++){ Node node = nodes.item(i); if(node instanceof Element){ //a child element to process Element child = (Element) node; String attribute = child.getAttribute("width"); } }
The getChildNodes()
method returns a NodeList
object, which is a list of
Node
elements. The Node
interface is a superinterface for pretty much all
of the different node types in DOM. This means, that the Document
interface inherits
from (extends) Node
, the Element
interface extends Node
,
the Attr
(attribute) interface extends Node
etc.
The fact that Node
is the super-interface of all the node-interfaces in DOM means that
you will sometimes have to look in the Node
interface for the methods you need, like
the method getChildNodes()
. This is something to be aware of, when trying to iterate
through a Document
graph.
DOM Element Attributes
As you have already seen, you can access the attributes of an element via the Element
interface. There are two ways to do so:
String attrValue = element.getAttribute("attrName"); Attr attribute = element.getAttributeNode("attrName");
Most of the time the getAttribute()
method will do just fine.
The Attr
interface extends Node
. It allows you to access the
owning element via the method getOwnerElement()
etc. Accessing an attribute
via this interface is mostly handy if you need to pass the attribute to one or more methods,
where the method needs to access more info about the attribute in order to process it.
There is a lot more you can do with the Document
object and the related nodes,
but accessing child elements and attributes are what you will be using 90% of the time. The
rest you can find by checking out the JavaDoc's. Sooner or later you will have to do that
anyways.
Tweet | |
Jakob Jenkov |