Generic Top

I just announced the new Learn Spring course, focused on the fundamentals of Spring 5 and Spring Boot 2:

>> CHECK OUT THE COURSE

1. Introduction

Groovy provides a substantial number of methods dedicated to traversing and manipulating XML content.

In this tutorial, we'll demonstrate how to add, edit, or delete elements from XML in Groovy using various approaches. We'll also show how to create an XML structure from scratch.

2. Defining the Model

Let's define an XML structure in our resources directory that we'll use throughout our examples:

<articles>
    <article>
        <title>First steps in Java</title>
        <author id="1">
            <firstname>Siena</firstname>
            <lastname>Kerr</lastname>
        </author>
        <release-date>2018-12-01</release-date>
    </article>
    <article>
        <title>Dockerize your SpringBoot application</title>
        <author id="2">
            <firstname>Jonas</firstname>
            <lastname>Lugo</lastname>
        </author>
        <release-date>2018-12-01</release-date>
    </article>
    <article>
        <title>SpringBoot tutorial</title>
        <author id="3">
            <firstname>Daniele</firstname>
            <lastname>Ferguson</lastname>
        </author>
        <release-date>2018-06-12</release-date>
    </article>
    <article>
        <title>Java 12 insights</title>
        <author id="1">
            <firstname>Siena</firstname>
            <lastname>Kerr</lastname>
        </author>
        <release-date>2018-07-22</release-date>
    </article>
</articles>

And read it into an InputStream variable:

def xmlFile = getClass().getResourceAsStream("articles.xml")

3. XmlParser

Let's start exploring this stream with the XmlParser class.

3.1. Reading

Reading and parsing an XML file is probably the most common XML operation a developer will have to do. The XmlParser provides a very straightforward interface meant for exactly that:

def articles = new XmlParser().parse(xmlFile)

At this point, we can access the attributes and values of XML structure using GPath expressions. 

Let's now implement a simple test using Spock to check whether our articles object is correct:

def "Should read XML file properly"() {
    given: "XML file"

    when: "Using XmlParser to read file"
    def articles = new XmlParser().parse(xmlFile)

    then: "Xml is loaded properly"
    articles.'*'.size() == 4
    articles.article[0].author.firstname.text() == "Siena"
    articles.article[2].'release-date'.text() == "2018-06-12"
    articles.article[3].title.text() == "Java 12 insights"
    articles.article.find { it.author.'@id'.text() == "3" }.author.firstname.text() == "Daniele"
}

To understand how to access XML values and how to use the GPath expressions, let's focus for a moment on the internal structure of the result of the XmlParser#parse operation.

The articles object is an instance of groovy.util.Node. Every Node consists of a name, attributes map, value, and parent (which can be either null or another Node).

In our case, the value of articles is a groovy.util.NodeList instance, which is a wrapper class for a collection of Nodes. The NodeList extends the java.util.ArrayList class, which provides extraction of elements by index. To obtain a string value of a Node, we use groovy.util.Node#text().

In the above example, we introduced a few GPath expressions:

  • articles.article[0].author.firstname — get the author's first name for the first article – articles.article[n] would directly access the nth article
  • ‘*'  — get a list of article‘s children – it's the equivalent of groovy.util.Node#children()
  • author.'@id' — get the author element's id attribute – author.'@attributeName' accesses the attribute value by its name (the equivalents are: author[‘@id'] and [email protected])

3.2. Adding a Node

Similar to the previous example, let's read the XML content into a variable first. This will allow us to define a new node and add it to our articles list using groovy.util.Node#append.

Let's now implement a test which proves our point:

def "Should add node to existing xml using NodeBuilder"() {
    given: "XML object"
    def articles = new XmlParser().parse(xmlFile)

    when: "Adding node to xml"
    def articleNode = new NodeBuilder().article(id: '5') {
        title('Traversing XML in the nutshell')
        author {
            firstname('Martin')
            lastname('Schmidt')
        }
        'release-date'('2019-05-18')
    }
    articles.append(articleNode)

    then: "Node is added to xml properly"
    articles.'*'.size() == 5
    articles.article[4].title.text() == "Traversing XML in the nutshell"
}

As we can see in the above example, the process is pretty straightforward.

Let's also notice that we used groovy.util.NodeBuilder, which is a neat alternative to using the Node constructor for our Node definition.

3.3. Modifying a Node

We can also modify the values of nodes using the XmlParser. To do so, let's once again parse the content of the XML file. Next, we can edit the content node by changing the value field of the Node object.

Let's remember that while XmlParser uses the GPath expressions, we always retrieve the instance of the NodeList, so to modify the first (and only) element, we have to access it using its index.

Let's check our assumptions by writing a quick test:

def "Should modify node"() {
    given: "XML object"
    def articles = new XmlParser().parse(xmlFile)

    when: "Changing value of one of the nodes"
    articles.article.each { it.'release-date'[0].value = "2019-05-18" }

    then: "XML is updated"
    articles.article.findAll { it.'release-date'.text() != "2019-05-18" }.isEmpty()
}

In the above example, we've also used the Groovy Collections API to traverse the NodeList.

3.4. Replacing a Node

Next, let's see how to replace the whole node instead of just modifying one of its values.

Similarly to adding a new element, we'll use the NodeBuilder for the Node definition and then replace one of the existing nodes within it using groovy.util.Node#replaceNode:

def "Should replace node"() {
    given: "XML object"
    def articles = new XmlParser().parse(xmlFile)

    when: "Adding node to xml"
    def articleNode = new NodeBuilder().article(id: '5') {
        title('Traversing XML in the nutshell')
        author {
            firstname('Martin')
            lastname('Schmidt')
        }
        'release-date'('2019-05-18')
    }
    articles.article[0].replaceNode(articleNode)

    then: "Node is added to xml properly"
    articles.'*'.size() == 4
    articles.article[0].title.text() == "Traversing XML in the nutshell"
}

3.5. Deleting a Node

Deleting a node using the XmlParser is quite tricky. Although the Node class provides the remove(Node child) method, in most cases, we wouldn't use it by itself.

Instead, we'll show how to delete a node whose value fulfills a given condition.

By default, accessing the nested elements using a chain of Node.NodeList references returns a copy of the corresponding children nodes. Because of that, we can't use the java.util.NodeList#removeAll method directly on our article collection.

To delete a node by a predicate, we have to find all nodes matching our condition first, and then iterate through them and invoke java.util.Node#remove method on the parent each time.

Let's implement a test that removes all articles whose author has an id other than 3:

def "Should remove article from xml"() {
    given: "XML object"
    def articles = new XmlParser().parse(xmlFile)

    when: "Removing all articles but the ones with id==3"
    articles.article
      .findAll { it.author.'@id'.text() != "3" }
      .each { articles.remove(it) }

    then: "There is only one article left"
    articles.children().size() == 1
    articles.article[0].author.'@id'.text() == "3"
}

As we can see, as a result of our remove operation, we received an XML structure with only one article, and its id is 3.

4. XmlSlurper

Groovy also provides another class dedicated to working with XML. In this section, we'll show how to read and manipulate the XML structure using the XmlSlurper.

4.1. Reading

As in our previous examples, let's start with parsing the XML structure from a file:

def "Should read XML file properly"() {
    given: "XML file"

    when: "Using XmlSlurper to read file"
    def articles = new XmlSlurper().parse(xmlFile)

    then: "Xml is loaded properly"
    articles.'*'.size() == 4
    articles.article[0].author.firstname == "Siena"
    articles.article[2].'release-date' == "2018-06-12"
    articles.article[3].title == "Java 12 insights"
    articles.article.find { it.author.'@id' == "3" }.author.firstname == "Daniele"
}

As we can see, the interface is identical to that of XmlParser. However, the output structure uses the groovy.util.slurpersupport.GPathResult, which is a wrapper class for Node. GPathResult provides simplified definitions of methods such as: equals() and toString() by wrapping Node#text(). As a result, we can read fields and parameters directly using just their names.

4.2. Adding a Node

Adding a Node is also very similar to using XmlParser. In this case, however, groovy.util.slurpersupport.GPathResult#appendNode provides a method that takes an instance of java.lang.Object as an argument. As a result, we can simplify new Node definitions following the same convention introduced by NodeBuilder:

def "Should add node to existing xml"() {
    given: "XML object"
    def articles = new XmlSlurper().parse(xmlFile)

    when: "Adding node to xml"
    articles.appendNode {
        article(id: '5') {
            title('Traversing XML in the nutshell')
            author {
                firstname('Martin')
                lastname('Schmidt')
            }
            'release-date'('2019-05-18')
        }
    }

    articles = new XmlSlurper().parseText(XmlUtil.serialize(articles))

    then: "Node is added to xml properly"
    articles.'*'.size() == 5
    articles.article[4].title == "Traversing XML in the nutshell"
}

In case we need to modify the structure of our XML with XmlSlurper, we have to reinitialize our articles object to see the results. We can achieve that using the combination of the groovy.util.XmlSlurper#parseText and the groovy.xmlXmlUtil#serialize methods.

4.3. Modifying a Node

As we mentioned before, the GPathResult introduces a simplified approach to data manipulation. That being said, in contrast to the XmlSlurper, we can modify the values directly using the node name or parameter name:

def "Should modify node"() {
    given: "XML object"
    def articles = new XmlSlurper().parse(xmlFile)

    when: "Changing value of one of the nodes"
    articles.article.each { it.'release-date' = "2019-05-18" }

    then: "XML is updated"
    articles.article.findAll { it.'release-date' != "2019-05-18" }.isEmpty()
}

Let's notice that when we only modify the values of the XML object, we don't have to parse the whole structure again.

4.4. Replacing a Node

Now let's move to replacing the whole node. Again, the GPathResult comes to the rescue. We can easily replace the node using groovy.util.slurpersupport.NodeChild#replaceNode, which extends GPathResult and follows the same convention of using the Object values as arguments:

def "Should replace node"() {
    given: "XML object"
    def articles = new XmlSlurper().parse(xmlFile)

    when: "Replacing node"
    articles.article[0].replaceNode {
        article(id: '5') {
            title('Traversing XML in the nutshell')
            author {
                firstname('Martin')
                lastname('Schmidt')
            }
            'release-date'('2019-05-18')
        }
    }

    articles = new XmlSlurper().parseText(XmlUtil.serialize(articles))

    then: "Node is replaced properly"
    articles.'*'.size() == 4
    articles.article[0].title == "Traversing XML in the nutshell"
}

As was the case when adding a node, we're modifying the structure of the XML, so we have to parse it again.

4.5. Deleting a Node

To remove a node using XmlSlurper, we can reuse the groovy.util.slurpersupport.NodeChild#replaceNode method simply by providing an empty Node definition:

def "Should remove article from xml"() {
    given: "XML object"
    def articles = new XmlSlurper().parse(xmlFile)

    when: "Removing all articles but the ones with id==3"
    articles.article
      .findAll { it.author.'@id' != "3" }
      .replaceNode {}

    articles = new XmlSlurper().parseText(XmlUtil.serialize(articles))

    then: "There is only one article left"
    articles.children().size() == 1
    articles.article[0].author.'@id' == "3"
}

Again, modifying the XML structure requires reinitialization of our articles object.

5. XmlParser vs XmlSlurper

As we showed in our examples, the usages of XmlParser and XmlSlurper are pretty similar. We can more or less achieve the same results with both. However, some differences between them can tilt the scales towards one or the other.

First of all, XmlParser always parses the whole document into the DOM-ish structure. Because of that, we can simultaneously read from and write into it. We can't do the same with XmlSlurper as it evaluates paths more lazily. As a result, XmlParser can consume more memory.

On the other hand, XmlSlurper uses more straightforward definitions, making it simpler to work with. We also need to remember that any structural changes made to XML using XmlSlurper require reinitialization, which can have an unacceptable performance hit in case of making many changes one after another.

The decision of which tool to use should be made with care and depends entirely on the use case.

6. MarkupBuilder

Apart from reading and manipulating the XML tree, Groovy also provides tooling to create an XML document from scratch. Let's now create a document consisting of the first two articles from our first example using groovy.xml.MarkupBuilder:

def "Should create XML properly"() {
    given: "Node structures"

    when: "Using MarkupBuilderTest to create xml structure"
    def writer = new StringWriter()
    new MarkupBuilder(writer).articles {
        article {
            title('First steps in Java')
            author(id: '1') {
                firstname('Siena')
                lastname('Kerr')
            }
            'release-date'('2018-12-01')
        }
        article {
            title('Dockerize your SpringBoot application')
            author(id: '2') {
                firstname('Jonas')
                lastname('Lugo')
            }
            'release-date'('2018-12-01')
        }
    }

    then: "Xml is created properly"
    XmlUtil.serialize(writer.toString()) == XmlUtil.serialize(xmlFile.text)
}

In the above example, we can see that MarkupBuilder uses the very same approach for the Node definitions we used with NodeBuilder and GPathResult previously.

To compare output from MarkupBuilder with the expected XML structure, we used the groovy.xml.XmlUtil#serialize method.

7. Conclusion

In this article, we explored multiple ways of manipulating XML structures using Groovy.

We looked at examples of parsing, adding, editing, replacing, and deleting nodes using two classes provided by Groovy: XmlParser and XmlSlurper. We also discussed differences between them and showed how we could build an XML tree from scratch using MarkupBuilder.

As always, the complete code used in this article is available over on GitHub.

Generic bottom

I just announced the new Learn Spring course, focused on the fundamentals of Spring 5 and Spring Boot 2:

>> CHECK OUT THE COURSE
Comments are closed on this article!