Course – LS – All

Get started with Spring and Spring Boot, through the Learn Spring course:

>> CHECK OUT THE COURSE

1. Overview

When we need to read an XML file manually, usually, we would like to read the content in a pretty-printed format. Many text editors or IDEs can reformat XML documents. If we work in Linux, we can pretty-print XML files from the command line.

However, sometimes, we have requirements to convert a raw XML string to the pretty-printed format in our Java program. For example, we may want to show a pretty-printed XML document in the user interface for better visual comprehension.

In this tutorial, we’ll explore how to pretty-print XML in Java.

2. Introduction to the Problem

For simplicity, we’ll take a non-formatted emails.xml file as the input:

<emails> <email> <from>Kai</from> <to>Amanda</to> <time>2018-03-05</time>
<subject>I am flying to you</subject></email> <email>
<from>Jerry</from> <to>Tom</to> <time>1992-08-08</time> <subject>Hey Tom, catch me if you can!</subject>
</email> </emails>

As we can see, the emails.xml file is well-formed. However, it’s not easy to read due to the messy format.

Our goal is to create a method to convert this ugly, raw XML string to a pretty-formatted string.

Further, we’ll discuss customizing two common output properties: indent-size (integer) and suppressing XML declaration (boolean).

The indent-size property is pretty straightforward: It’s the number of spaces to indent (per level). On the other hand, the suppressing XML declaration option decides if we want to have the XML declaration tag in the generated XML. A typical XML declaration looks like:

<?xml version="1.0" encoding="UTF-8"?>

In this tutorial, we’ll address a solution with the standard Java API and another approach using an external library.

Next, let’s see them in action.

3. Pretty-Printing XML With the Transformer Class

Java API provides the Transformer class to do XML transformations.

3.1. Using the Default Transformer

First, let’s see the pretty-print solution using the Transformer class:

public static String prettyPrintByTransformer(String xmlString, int indent, boolean ignoreDeclaration) {

    try {
        InputSource src = new InputSource(new StringReader(xmlString));
        Document document = DocumentBuilderFactory.newInstance().newDocumentBuilder().parse(src);

        TransformerFactory transformerFactory = TransformerFactory.newInstance();
        transformerFactory.setAttribute("indent-number", indent);
        Transformer transformer = transformerFactory.newTransformer();
        transformer.setOutputProperty(OutputKeys.ENCODING, "UTF-8");
        transformer.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, ignoreDeclaration ? "yes" : "no");
        transformer.setOutputProperty(OutputKeys.INDENT, "yes");

        Writer out = new StringWriter();
        transformer.transform(new DOMSource(document), new StreamResult(out));
        return out.toString();
    } catch (Exception e) {
        throw new RuntimeException("Error occurs when pretty-printing xml:\n" + xmlString, e);
    }
}

Now, let’s walk through the method quickly and figure out how it works:

  • First, we parse the raw XML string and get a Document object.
  • Next, we obtain a TransformerFactory instance and set the required indent-size attribute.
  • Then, we can get a default transformer instance from the configured tranformerFactory object.
  • The transformer object supports various output properties. To decide if we want to skip the declaration, we set the OutputKeys.OMIT_XML_DECLARATION attribute.
  • Since we would like to have a pretty-formatted String object, finally, we transform() the parsed XML Document to a StringWriter and return the transformed String.

We’ve set the indent size on the TransformerFactory object in the method above. Alternatively, we can also define the indent-amount property on the transformer instance:

transformer.setOutputProperty("{http://xml.apache.org/xslt}indent-amount", String.valueOf(indent));

Next, let’s test if the method works as expected.

3.2. Testing the Method

Our Java project is a Maven project, and we’ve put the emails.xml under src/main/resources/xml/email.xml. We’ve created the readFromInputStream method to read the input file as a String. But, we won’t go into the details of this method since it doesn’t have much to do with our topic here. Let’s say we want to set the indent-size=2 and skip the XML declaration in the result:

public static void main(String[] args) throws IOException {
    InputStream inputStream = XmlPrettyPrinter.class.getResourceAsStream("/xml/emails.xml");
    String xmlString = readFromInputStream(inputStream);
    System.out.println("Pretty printing by Transformer");
    System.out.println("=============================================");
    System.out.println(prettyPrintByTransformer(xmlString, 2, true));
}

As the main method shows, we read the input file as a String and then call our prettyPrintByTransformer method to get a pretty-printed XML String.

Next, let’s run the main method with Java 8:

Pretty printing by Transformer
=============================================
<emails>
  <email>
    <from>Kai</from>
    <to>Amanda</to>
    <time>2018-03-05</time>
    <subject>I am flying to you</subject>
  </email>
  <email>
    <from>Jerry</from>
    <to>Tom</to>
    <time>1992-08-08</time>
    <subject>Hey Tom, catch me if you can!</subject>
  </email>
</emails>

As the output above shows, our method works as expected.

However, if we test it once again with Java 9 or a later version, we may see different output.

Next, let’s see what it produces if we run it with Java 9:

Pretty printing by Transformer
=============================================
<emails>
   
  <email>
     
    <from>Kai</from>
     
    <to>Amanda</to>
     
    <time>2018-03-05</time>
    
    <subject>I am flying to you</subject>
  </email>
   
  <email>
    
    <from>Jerry</from>
     
    <to>Tom</to>
     
    <time>1992-08-08</time>
     
    <subject>Hey Tom, catch me if you can!</subject>
    
  </email>
   
</emails>

=============================================

As we can see in the output above, there are unexpected empty lines in the output.

This is because our raw input contains whitespace between elements, for example:

<emails> <email> <from>Kai</from> ...

As of Java 9, the Transformer class’s pretty-print feature doesn’t define the actual format. Therefore, whitespace-only nodes will be outputted as well. This has been discussed in this JDK bug ticket. Also, Java 9’s release note has explained this in the xml/jaxp section.

If we want our pretty-print method to always generate the same format under various Java versions, we need to provide a stylesheet file.

Next, let’s create a simple xsl file to achieve that.

3.3. Providing an XSLT File

First, let’s create the prettyprint.xsl file to define the output format:

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    <xsl:strip-space elements="*"/>
    <xsl:output method="xml" encoding="UTF-8"/>

    <xsl:template match="@*|node()">
        <xsl:copy>
            <xsl:apply-templates select="@*|node()"/>
        </xsl:copy>
    </xsl:template>

</xsl:stylesheet>

As we can see, in the prettyprint.xsl file, we’ve used the <xsl:strip-space/> element to remove whitespace-only nodes so that they do not appear in the output.

Next, we still need to make a small change to our method. We won’t use the default transformer anymore. Instead, we’ll create a Transformer object with our XSLT document:

Transformer transformer = transformerFactory.newTransformer(new StreamSource(new StringReader(readPrettyPrintXslt())));

Here, the readPrettyPrintXslt() method reads prettyprint.xsl content.

Now, if we test the method in Java 8 and Java 9, both produce the same output:

Pretty printing by Transformer
=============================================
<emails>
  <email>
    <from>Kai</from>
    <to>Amanda</to>
    <time>2018-03-05</time>
    <subject>I am flying to you</subject>
  </email>
...
</emails>

We’ve solved the problem with the standard Java API. Next, let’s pretty print the emails.xml using an external library.

4. Pretty-Printing XML With the Dom4j Library

Dom4j is a popular XML library. It allows us to easily pretty-print XML documents.

First, let’s add the Dom4j dependency into our pom.xml:

<dependency>
    <groupId>org.dom4j</groupId>
    <artifactId>dom4j</artifactId>
    <version>2.1.3</version>
</dependency>

We’ve used the 2.1.3 version as an example. We can find the latest version in the Maven Central repository.

Next, let’s see how to pretty-print XML using the Dom4j library:

public static String prettyPrintByDom4j(String xmlString, int indent, boolean skipDeclaration) {
    try {
        OutputFormat format = OutputFormat.createPrettyPrint();
        format.setIndentSize(indent);
        format.setSuppressDeclaration(skipDeclaration);
        format.setEncoding("UTF-8");

        org.dom4j.Document document = DocumentHelper.parseText(xmlString);
        StringWriter sw = new StringWriter();
        XMLWriter writer = new XMLWriter(sw, format);
        writer.write(document);
        return sw.toString();
    } catch (Exception e) {
        throw new RuntimeException("Error occurs when pretty-printing xml:\n" + xmlString, e);
    }
}

D0m4j’s OutputFormat class has provided a createPrettyPrint method to create a pre-defined pretty-print OutputFormat object. As the method above shows, we can add some customizations on the default pretty-print format. In this case, we set the indent size and decide if we would like to include the declaration in the result.

Next, we parse the raw XML string and create an XMLWritter object with the prepared OutputFormat instance.

Finally, the XMLWriter object will write the parsed XML document in the required format.

Next, let’s test if it can pretty-print the emails.xml file. This time, let’s say we would like to include the declaration and have an indent size of 8 in the result:

System.out.println("Pretty printing by Dom4j");
System.out.println("=============================================");
System.out.println(prettyPrintByDom4j(xmlString, 8, false));

When we run the method, we’ll see the output:

Pretty printing by Dom4j
=============================================
<?xml version="1.0" encoding="UTF-8"?>

<emails> 
        <email> 
                <from>Kai</from>  
                <to>Amanda</to>  
                <time>2018-03-05</time>  
                <subject>I am flying to you</subject>
        </email>  
        <email> 
                <from>Jerry</from>  
                <to>Tom</to>  
                <time>1992-08-08</time>  
                <subject>Hey Tom, catch me if you can!</subject> 
        </email> 
</emails>

As the output above shows, the method has solved the problem.

5. Conclusion

In this article, we’ve addressed two approaches to pretty-print an XML file in Java.

We can pretty-print XMLs using the standard Java API. However, we need to keep in mind the Transformer object may produce different results depending on the Java version. The solution is to provide an XSLT file.

Alternatively, the Dom4j library can solve the problem straightforwardly.

As always, the full version of the code is available over on GitHub.

Course – LS – All

Get started with Spring and Spring Boot, through the Learn Spring course:

>> CHECK OUT THE COURSE
res – REST with Spring (eBook) (everywhere)
Comments are closed on this article!