Learn through the super-clean Baeldung Pro experience:
>> Membership and Baeldung Pro.
No ads, dark-mode and 6 months free of IntelliJ Idea Ultimate to start with.
Last updated: March 18, 2024
When we work with XML, we can use XPath to navigate through elements and attributes in the XML document.
In this tutorial, we’ll discuss how to evaluate XPath expressions under the Linux command line.
First of all, let’s create an XML document, books.xml, as the input XML file that we’ll use throughout this tutorial:
<books>
<book id="1" category="linux">
<title lang="en">Linux Device Drivers</title>
<year>2003</year>
<author>Jonathan Corbet</author>
<author>Alessandro Rubini</author>
</book>
<book id="2" category="linux">
<title lang="en">Understanding the Linux Kernel</title>
<year>2005</year>
<author>Daniel P. Bovet</author>
<author>Marco Cesati</author>
</book>
<book id="3" category="novel">
<title lang="en">A Game of Thrones</title>
<year>2013</year>
<author>George R. R. Martin</author>
</book>
<book id="4" category="novel">
<title lang="fr">The Little Prince</title>
<year>1990</year>
<author>Antoine de Saint-Exupéry</author>
</book>
</books>
In our books.xml file, we have four books. Later, we’ll address how to evaluate a couple of XPath expressions under the Linux command line:
In this tutorial, we’re going to discuss three different approaches to work with XPath under the command line:
The xmllint command is installed with the xmllib2 package. Usually, we can use this command to validate XML files, parse XML files, or pretty-print an XML file.
The xmllint command supports a “–xpath” option to evaluate XPath expressions:
xmllint --xpath "XPATH_EXPRESSION" INPUT.xml
It’s worthwhile to mention that, since xmllib2 only implements XPath 1.0, the xmllint command supports only XPath 1.0.
Let’s test with our XPath expressions to see if we can get the expected result.
First, let’s select all title elements of English books in our books.xml:
$ xmllint --xpath "//title[@lang='fr']" books.xml
<title lang="fr">The Little Prince</title>
We got the title element of the book “The Little Prince” in the output. This is correct since it’s the only title element with the lang=”fr” attribute.
Second, let’s test the other XPath expression:
$ xmllint --xpath "//book[year>2004]/title" books.xml
<title lang="en">Understanding the Linux Kernel</title>
<title lang="en">A Game of Thrones</title>
This time, xmllint prints two title elements. Our second XPath expression is also correctly evaluated by the xmllint command.
XMLStarlet is a powerful command-line XML toolkit based on libxml2. Therefore, similar to the xmllint command, XMLStartlet only supports XPath 1.0.
XMLStartlet ships with one executable called xml, which we can use as the short form of the xmlstarlet command.
The syntax of the xml command is:
xml [options] <command> [command options]
XMLStarlet defines a set of commands to perform different XML operations — for example, ed (edit) to edit or update an XML document, tr (transform) to transform an XML document using XSLT, and so on.
To select data or query XML documents using XPath, we can take the sel (select) command. In fact, the sel command can do much more than XPath expression evaluation.
Basically, the sel command allows us to avoid writing an XSLT stylesheet to perform some XML document queries. It can generate XSLT for us from the combination of command-line options.
That is to say, when we use the sel command, XMLStarlet will convert all our command arguments into XSLT to do the query on the input XML documents.
Let’s have a look at the general syntax of the sel command:
xml sel -t <template options> Input.xml
XSLT template is a fundamental concept of XSLT. Using the sel command, we create a template using the -t option.
In this tutorial, we won’t dive into XSLT transformation. Our goal is to evaluate XPath expressions.
The sel command supports many template options. We’ll introduce two of them: -c and -v because these two template options are pretty commonly used for XPath evaluation.
For example, let’s say the evaluation result of an XPath expression is <element>text</element>:
Now, let’s give it a try with our two XPath expressions.
First, we’ll test our first XPath expression using the xml sel command with the -c template option:
$ xml sel -t -c "//title[@lang='fr']" books.xml
<title lang="fr">The Little Prince</title>
As the output shows, our XPath expression has been correctly evaluated, and we’ve got the expected title element.
Next, let’s have a look at what we’ll get if we use the -v template option:
$ xml sel -t -v "//title[@lang='fr']" books.xml
The Little Prince
This time, we got the text of the title element without XML tags.
Now, let’s test the command with our other XPath expression:
$ xml sel -t -c "/books/book[year>2004]/title" books.xml
<title lang="en">Understanding the Linux Kernel</title><title lang="en">A Game of Thrones</title>
When we use the -c option, the output contains the two expected title elements.
However, the output is not “pretty-printed.” The line breaks between XML elements are somehow swallowed.
This happens because the line breaks between elements are treated as whitespace, meaning that the <xsl:copy-of> instruction will remove all whitespace between elements.
Next, let’s see what we’ll get if we use the -v option:
$ xml sel -t -v "/books/book[year>2004]/title" books.xml
Understanding the Linux Kernel
A Game of Thrones
As the output shows, when we use the -v option, we’ll get the text of the matching elements, with each value on a separate line.
This time, the line breaks are not removed. That’s because when the result has multiple elements, the <xsl:value-of> will sit in a <xsl:for-each> element, something like:
<xsl:for-each select="/books/book[year>2004]/title">
<xsl:value-of select="."/>
</xsl:for-each>
Thus, the text of each matching element will be printed to a separate line.
The xidel command is a nice XML/HTML/JSON data extraction utility and supports XPath 3.0.
Extracting data using the xidel command with an XPath expression is pretty straightforward:
xidel [options] --xpath "XPath Expression" XML_INPUT
We can pass some options to control the output, as we’ll see in later examples.
Let’s try the xidel command with our first XPath expression:
$ xidel --xpath "//title[@lang='fr']" books.xml
**** Retrieving: books.xml ****
**** Processing: books.xml ****
The Little Prince
As we can see in the output, xidel prints status information by default. Also, it extracts the text out of the found elements automatically.
If we want to skip the status messages, we can add the -s option to let xidel work in “silent” mode.
Moreover, we can ask xidel to print the complete XML elements by passing the –printed-node-format=”xml” option:
The screenshot above shows one nice feature of the xidel command: When xidel output is in XML format, it highlights the attributes in the console output.
Next, let’s execute the xidel command with our second XPath expression:
$ xidel -s --printed-node-format="xml" --xpath "/books/book[year>2004]/title" books.xml
<title lang="en">Understanding the Linux Kernel</title>
<title lang="en">A Game of Thrones</title>
As we expected, it prints the two title elements from our sample file.
Finally, let’s test if the xidel command can work with XPath 3.0 expressions.
The sequence data type has been around since XPath 3.0. So, we’ll write an XPath expression using the sequence data type to print book elements if its publishing year is in a given sequence of values: //book[year=(2004, 2005, 2013, 2020)]
Let’s see if xidel can evaluate this XPath expression and find the books we’re interested in:
$ xidel -s --printed-node-format="xml" --xpath "//book[year=(2004, 2005, 2013, 2020)]" books.xml
<book id="2" category="linux">
<title lang="en">Understanding the Linux Kernel</title>
<year>2005</year>
<author>Daniel P. Bovet</author>
<author>Marco Cesati</author>
</book>
<book id="3" category="novel">
<title lang="en">A Game of Thrones</title>
<year>2013</year>
<author>George R. R. Martin</author>
</book>
Great, it works with the XPath 3.0 expression!
Since xmllint and XMLStarlet only support XPath 1.0, they cannot evaluate this XPath expression:
$ xmllint --xpath "//book[year=(2004, 2005, 2013, 2020)]" books.xml
XPath error : Invalid expression
//book[year=(2004, 2005, 2013, 2020)]
^
XPath evaluation failure
$ xml sel -t -c "//book[year=(2004, 2005, 2013, 2020)]" books.xml
Invalid expression: //book[year=(2004, 2005, 2013, 2020)]
compilation error: element copy-of
xsl:copy-of : could not compile select expression '//book[year=(2004, 2005, 2013, 2020)]'
In this article, we’ve introduced how to evaluate XPath expressions under the Linux command line.
We’ve addressed three different utilities to do the job through examples.