DOM parsing with Xerces

Refactor Java code safely — and automatically — with OpenRewrite.

Refactoring big codebases by hand is slow, risky, and easy to put off. That’s where OpenRewrite comes in. The open-source framework for large-scale, automated code transformations helps teams modernize safely and consistently.

Each month, the creators and maintainers of OpenRewrite at Moderne run live, hands-on training sessions — one for newcomers and one for experienced users. You’ll see how recipes work, how to apply them across projects, and how to modernize code with confidence.

Join the next session, bring your questions, and learn how to automate the kind of work that usually eats your sprint time.

1. Overview

In this tutorial, we’ll discuss how to parse DOM with Apache Xerces – a mature and established library for parsing/manipulating XML.

There are multiple options to parse an XML document; we’ll focus on DOM parsing in this article. The DOM parser loads a document and creates an entire hierarchical tree in memory.

For an overview of XML libraries support in Java check out our previous article.

2. Our Document

Let’s start with the XML document we’re going to use in our example:

<?xml version="1.0"?>
<tutorials>
    <tutorial tutId="01" type="java">
        <title>Guava</title>
        <description>Introduction to Guava</description>
        <date>04/04/2016</date>
        <author>GuavaAuthor</author>
    </tutorial>
...
</tutorials>

Note that our document has a root node called “tutorials” with 4 “tutorial” child nodes. Each of these has 2 attributes: “tutId” and “type”. Also, each “tutorial” has 4 child nodes: “title”, “description”, “date” and “author”.

Now we can continue with parsing this document.

3. Loading XML File

First, we should note that the Apache Xerces library is packaged with the JDK, so we don’t need any additional setup.

Let’s jump right into loading our XML file:

DocumentBuilder builder = DocumentBuilderFactory.newInstance().newDocumentBuilder();
Document doc = builder.parse(new File("src/test/resources/example_jdom.xml"));
doc.getDocumentElement().normalize();

In the example above, we first obtain an instance of the DocumentBuilder class, then use the parse() method on the XML document to get a Document object representing it.

We also need to use the normalize() method to ensure that the document hierarchy isn’t affected by any extra white spaces or new lines within nodes.

4. Parsing the DOM

Now, let’s explore our XML file.

Let’s start by retrieving all elements with tag “tutorial”. We can do this using the getElementsByTagName() method, which will return a NodeList:

@Test
public void whenGetElementByTag_thenSuccess() {
    NodeList nodeList = doc.getElementsByTagName("tutorial");
    Node first = nodeList.item(0);

    assertEquals(4, nodeList.getLength());
    assertEquals(Node.ELEMENT_NODE, first.getNodeType());
    assertEquals("tutorial", first.getNodeName());        
}

It’s important to note that Node is the primary datatype for the DOM components. All the elements, attributes, text are considered nodes.

Next, let’s see how we can get the first element’s attributes using getAttributes():

@Test
public void whenGetFirstElementAttributes_thenSuccess() {
    Node first = doc.getElementsByTagName("tutorial").item(0);
    NamedNodeMap attrList = first.getAttributes();

    assertEquals(2, attrList.getLength());
    
    assertEquals("tutId", attrList.item(0).getNodeName());
    assertEquals("01", attrList.item(0).getNodeValue());
    
    assertEquals("type", attrList.item(1).getNodeName());
    assertEquals("java", attrList.item(1).getNodeValue());
}

Here, we get the NamedNodeMap object, then use the item(index) method to retrieve each node.

For every node, we can use getNodeName() and getNodeValue() to find their attributes.

5. Traversing Nodes

Next, let’s see how to traverse DOM nodes.

In the following test, we’ll traverse the first element’s child nodes and print their content:

@Test
public void whenTraverseChildNodes_thenSuccess() {
    Node first = doc.getElementsByTagName("tutorial").item(0);
    NodeList nodeList = first.getChildNodes();
    int n = nodeList.getLength();
    Node current;
    for (int i=0; i<n; i++) {
        current = nodeList.item(i);
        if(current.getNodeType() == Node.ELEMENT_NODE) {
            System.out.println(
              current.getNodeName() + ": " + current.getTextContent());
        }
    }
}

First, we get the NodeList using the getChildNodes() method, then iterate through it, and print the node name and text content.

The output will show the contents of the first “tutorial” element in our document:

title: Guava
description: Introduction to Guava
date: 04/04/2016
author: GuavaAuthor

6. Modifying the DOM

We can also make changes to the DOM.

As an example, let’s change the value of the type attribute from “java” to “other”:

@Test
public void whenModifyDocument_thenModified() {
    NodeList nodeList = doc.getElementsByTagName("tutorial");
    Element first = (Element) nodeList.item(0);

    assertEquals("java", first.getAttribute("type")); 
    
    first.setAttribute("type", "other");
    assertEquals("other", first.getAttribute("type"));     
}

Here, changing the attribute value is a simple matter of calling an Element‘s setAttribute() method.

7. Creating a New Document

Besides modifying the DOM, we can also create new XML documents from scratch.

Let’s first have a look at the file we want to create:

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<users>
    <user id="1">
        <email>[email protected]</email>
    </user>
</users>

Our XML contains a users root node with one user element that also has a child node email.

To achieve this, we first have to call the Builder‘s newDocument() method which returns a Document object.

Then, we’ll call the createElement() method of the new object:

@Test
public void whenCreateNewDocument_thenCreated() throws Exception {
    Document newDoc = builder.newDocument();
    Element root = newDoc.createElement("users");
    newDoc.appendChild(root);

    Element first = newDoc.createElement("user");
    root.appendChild(first);
    first.setAttribute("id", "1");

    Element email = newDoc.createElement("email");
    email.appendChild(newDoc.createTextNode("[email protected]"));
    first.appendChild(email);

    assertEquals(1, newDoc.getChildNodes().getLength());
    assertEquals("users", newDoc.getChildNodes().item(0).getNodeName());
}

To add each element to the DOM, we’re also calling the appendChild() method.

8. Saving a Document

After modifying our document or creating one from scratch, we’ll need to save it in a file.

We’ll start with creating a DOMSource object, then use a simple Transformer to save the document in a file:

private void saveDomToFile(Document document,String fileName) 
  throws Exception {
 
    DOMSource dom = new DOMSource(document);
    Transformer transformer = TransformerFactory.newInstance()
      .newTransformer();

    StreamResult result = new StreamResult(new File(fileName));
    transformer.transform(dom, result);
}

Similarly, we can print our document in the console:

private void printDom(Document document) throws Exception{
    DOMSource dom = new DOMSource(document);
    Transformer transformer = TransformerFactory.newInstance()
        .newTransformer();

    transformer.transform(dom, new StreamResult(System.out));
}