Java Top

The early-bird price of the new Learn Spring Security OAuth course packages will increase by $50 on Wednesday:

>> CHECK OUT THE COURSE

1. Overview

SAX, also known as the Simple API for XML, is used for parsing XML documents.

In this tutorial, we'll learn what SAX is and why, when and how it should be used.

2. SAX: The Simple API for XML

SAX is an API used to parse XML documents. It is based on events generated while reading through the document. Callback methods receive those events. A custom handler contains those callback methods.

The API is efficient because it drops events right after the callbacks received them. Therefore, SAX has efficient memory management, unlike DOM, for example.

3. SAX vs DOM

DOM stands for Document Object Model. The DOM parser does not rely on events. Moreover, it loads the whole XML document into memory to parse it. SAX is more memory-efficient than DOM.

DOM has its benefits, too. For example, DOM supports XPath. It makes it also easy to operate on the whole document tree at once since the document is loaded into memory.

4. SAX vs StAX

StAX is more recent than SAX and DOM. It stands for Streaming API for XML.

The main difference with SAX is that StAX uses a pull mechanism instead of SAX's push mechanism (using callbacks).
This means the control is given to the client to decide when the events need to be pulled. Therefore, there is no obligation to pull the whole document if only a part of it is needed.

It provides an easy API to work with XML with a memory-efficient way of parsing.

Unlike SAX, it doesn't provide schema validation as one of its features.

5. Parsing the XML File Using a Custom Handler

Let's now use the following XML representing the Baeldung website and its articles:

<baeldung>
    <articles>
        <article>
            <title>Parsing an XML File Using SAX Parser</title>
            <content>SAX Parser's Lorem ipsum...</content>
        </article>
        <article>
            <title>Parsing an XML File Using DOM Parser</title>
            <content>DOM Parser's Lorem ipsum...</content>
        </article>
        <article>
            <title>Parsing an XML File Using StAX Parser</title>
            <content>StAX's Lorem ipsum...</content>
        </article>
    </articles>
</baeldung>

We'll begin by creating POJOs for our Baeldung root element and its children:

public class Baeldung {
    private List<BaeldungArticle> articleList;
    // usual getters and setters
}
public class BaeldungArticle {
    private String title;
    private String content;
    // usual getters and setters
}

We'll continue by creating the BaeldungHandler. This class will implement the callback methods necessary to capture the events.

We'll override four methods from the superclass DefaultHandler, each characterizing an event:

    • characters(char[], int, int) receives characters with boundaries. We'll convert them to a String and store it in a variable of BaeldungHandler
    • startDocument() is invoked when the parsing begins – we'll use it to construct our Baeldung instance
    • startElement() is invoked when the parsing begins for an element – we'll use it to construct either List<BaeldungArticle> or BaeldungArticle instances – qName helps us make the distinction between both types
    • endElement() is invoked when the parsing ends for an element – this is when we'll assign the content of the tags to their respective variables

With all the callbacks defined, we can now write the BaeldungHandler class:

public class BaeldungHandler extends DefaultHandler {
    private static final String ARTICLES = "articles";
    private static final String ARTICLE = "article";
    private static final String TITLE = "title";
    private static final String CONTENT = "content";

    private Baeldung website;
    private String elementValue;

    @Override
    public void characters(char[] ch, int start, int length) throws SAXException {
        elementValue = new String(ch, start, length);
    }

    @Override
    public void startDocument() throws SAXException {
        website = new Baeldung();
    }

    @Override
    public void startElement(String uri, String lName, String qName, Attributes attr) throws SAXException {
        switch (qName) {
            case ARTICLES:
                website.articleList = new ArrayList<>();
                break;
            case ARTICLE:
                website.articleList.add(new BaeldungArticle());
        }
    }

    @Override
    public void endElement(String uri, String localName, String qName) throws SAXException {
        switch (qName) {
            case TITLE:
                latestArticle().title = elementValue;
                break;
            case CONTENT:
                latestArticle().content = elementValue;
                break;
        }
    }

    private BaeldungArticle latestArticle() {
        List<BaeldungArticle> articleList = website.articleList;
        int latestArticleIndex = articleList.size() - 1;
        return articleList.get(latestArticleIndex);
    }

    public Baeldung getWebsite() {
        return website;
    }
}

String constants have also been added to increase readability. A method to retrieve the latest encountered article is also convenient. Finally, we need a getter for the Baeldung object.

Note that the above isn't thread-safe since we're holding onto state in between the method calls.

6. Testing the Parser

In order to test the parser, we'll instantiate the SaxFactory, the SaxParser and also the BaeldungHandler:

SAXParserFactory factory = SAXParserFactory.newInstance();
SAXParser saxParser = factory.newSAXParser();
SaxParserMain.BaeldungHandler baeldungHandler = new SaxParserMain.BaeldungHandler();

After that, we'll parse the XML file and assert that the object contains all expected elements parsed:

saxParser.parse("src/test/resources/sax/baeldung.xml", baeldungHandler);

SaxParserMain.Baeldung result = baeldungHandler.getWebsite();

assertNotNull(result);
List<SaxParserMain.BaeldungArticle> articles = result.getArticleList();

assertNotNull(articles);
assertEquals(3, articles.size());

SaxParserMain.BaeldungArticle articleOne = articles.get(0);
assertEquals("Parsing an XML File Using SAX Parser", articleOne.getTitle());
assertEquals("SAX Parser's Lorem ipsum...", articleOne.getContent());

SaxParserMain.BaeldungArticle articleTwo = articles.get(1);
assertEquals("Parsing an XML File Using DOM Parser", articleTwo.getTitle());
assertEquals("DOM Parser's Lorem ipsum...", articleTwo.getContent());

SaxParserMain.BaeldungArticle articleThree = articles.get(2);
assertEquals("Parsing an XML File Using StAX Parser", articleThree.getTitle());
assertEquals("StAX Parser's Lorem ipsum...", articleThree.getContent());

As expected, the baeldung has been parsed correctly and contains the awaited sub-objects.

7. Conclusion

We just discovered how to use SAX to parse XML files. It's a powerful API generating a light memory footprint in our applications.

As usual, the code for this article is available over on GitHub.

Java bottom

The early-bird price of the new Learn Spring Security OAuth course packages will increase by $50 on Wednesday:

>> CHECK OUT THE COURSE
Comments are closed on this article!