Encoding Special Characters in XML

Last updated: August 6, 2025

Written by: Graham Cox

Java Characters

Modern software architecture is often broken. Slow delivery leads to missed opportunities, innovation is stalled due to architectural complexities, and engineering resources are exceedingly expensive.

Orkes is the leading workflow orchestration platform built to enable teams to transform the way they develop, connect, and deploy applications, microservices, AI agents, and more.

With Orkes Conductor managed through Orkes Cloud, developers can focus on building mission critical applications without worrying about infrastructure maintenance to meet goals and, simply put, taking new products live faster and reducing total cost of ownership.

Try a 14-Day Free Trial of Orkes Conductor today.

Orkes is the leading workflow orchestration platform built to enable teams to transform the way they develop, connect, and deploy applications, microservices, AI agents, and more.

Try a 14-Day Free Trial of Orkes Conductor today.

Browser testing is essential if you have a website or web applications that users interact with. Manual testing can be very helpful to an extent, but given the multiple browsers available, not to mention versions and operating system, testing everything manually becomes time-consuming and repetitive.

To help automate this process, Selenium is a popular choice for developers, as an open-source tool with a large and active community. What's more, we can further scale our automation testing by running on theLambdaTest cloud-based testing platform.

Read more through our step-by-step tutorial on how to set up Selenium tests with Java and run them on LambdaTest:

>> Automated Browser Testing With Selenium

Orkes is the leading workflow orchestration platform built to enable teams to transform the way they develop, connect, and deploy applications, microservices, AI agents, and more.

Try a 14-Day Free Trial of Orkes Conductor today.

Refactor Java code safely — and automatically — with OpenRewrite.

Refactoring big codebases by hand is slow, risky, and easy to put off. That’s where OpenRewrite comes in. The open-source framework for large-scale, automated code transformations helps teams modernize safely and consistently.

Each month, the creators and maintainers of OpenRewrite at Moderne run live, hands-on training sessions — one for newcomers and one for experienced users. You’ll see how recipes work, how to apply them across projects, and how to modernize code with confidence.

Join the next session, bring your questions, and learn how to automate the kind of work that usually eats your sprint time.

1. Introduction

In this article, we’re going to explore XML entities, what they are, and what they can do for us. In particular, we’ll see what entities exist as standard with XML and how we can define our own if necessary.

2. How Is XML Structured?

XML is a markup format for representing arbitrary data. It does this using a hierarchical structure of XML elements, each of which can have attributes. For example:

<part number="1976">
    <name>Windscreen Wiper</name>
</part>

This shows an element called “part” that has one attribute – “number” – and one nested element – “name”.

Notably, the XML language uses some special characters to manage this. For example, an element always starts with a less-than sign – “<” and ends with a greater-than sign – “>”.

However, if these characters have a special meaning to XML, that means we can’t use them within our content. Doing so would be ambiguous at least and outright unparsable at worst.

For example, if we were to try to use XML to represent a simple math equation, we might write:

<math> 1 < x > 5 </math>

This attempts to represent that x has a value between 1 and 5. However, an XML process can’t know that the intention isn’t to have an element “x” in between the two numbers.

3. Standard XML Entities

XML solves this problem through the use of XML Entities. These are special sequences that instead represent other characters.

XML entities always start with an ampersand character – “&” – and end with a semicolon character – “;”. The name of the entity is then between these two characters. For example, the entity “<” is used to represent the less-than character – “<“.

There’s a set of five standard entities that are necessary to represent the characters with special meaning to XML:

Entity	Character Represented
&	Ampersand – &
'	Apostrophe – ‘
>	Greater-than sign – >
<	Less-than sign – <
"	Quotation mark – “

Knowing this, our above attempt to represent a math equation would become:

Suddenly, there’s no ambiguity in how to understand this.

4. Character Entities

In addition to the above, XML also offers the ability to represent arbitrary Unicode characters. We do this by directly referencing the Unicode code point in decimal or hexadecimal form.

These are standard XML entities – meaning that they’re prefixed with an “&” character and suffixed with a “;” character. Decimal codepoints are then prefixed with a “#” character and hexadecimal ones with “#x”.

For example, the character “÷” is the division sign. Unicode represents this as the code point U+00F7. As such, we can represent this in XML as or as .

This is especially useful if we aren’t using a Unicode character set to encode our XML documents – for example, if we’re using ISO-8859-1 instead – but still want to represent Unicode characters. It can also be useful to represent certain special characters, such as non-printing or combining characters so that a developer reading it can see they’re present.

Finally, we can use this to represent control characters that otherwise can’t be present in the document – for example, U+0000 is the Nul character, but having this bare character present in the document is likely to break many readers.

5. Custom Entities

It’s also possible for us to define our own XML entities. This lets us specify an entity name of our choosing and define the value that it’ll be replaced with. This can help if we have certain values that are repetitive and that we need to manage easily, but it does open up some potential security risks if used carelessly.

We need to use a Document Type Definition (DTD) to define custom entities. This is a section before the start of the XML document that can be used to define its structure – similar to an XSD. We do this with the “<!DOCTYPE name […]>” construct, where “name” is an arbitrary name for the DTD:

<!DOCTYPE example [
    ....
]>
<part number="1976">
    <name>Windscreen Wiper</name>
</part>

Inside this construct, we include the DTD definition. This can include, among other things, custom entity definitions – either as internal or external entities.

5.1. Internal Entities

An internal entity is defined directly in line, giving it a name and a value. Once this is done, an entity of this name can be used as-is and treated as any other entity. For example:

<!DOCTYPE example [
    <!ENTITY windscreen "Windscreen Wiper">
]>
<part number="1976">
    <name>&windscreen;</name>
</part>

Here, we’ve defined a custom entity named “windscreen” and a replacement value of “Windscreen Wiper”. We use this with “&windscreen;”. Our XML process will replace this with the “Windscreen Wiper” value.

5.2. External Entities

External entities work the same, but instead of providing the value directly in the DTD, we provide the location to find it. For example:

<!DOCTYPE example [
    <!ENTITY windscreen SYSTEM "http://example.com/parts/windscreen.txt">
]>
<part number="1976">
    <name>&windscreen;</name>
</part>

Here, we have defined a custom entity with the name “windscreen” and the replacement value of whatever is found at the URL “http://example.com/parts/windscreen.txt”. We can use this exactly as before, and the XML processor will automatically fetch this external resource to include when needed.

5.3. Potential Security Risks

Using custom entities can be powerful but can also open us to some potential security risks. In particular, if we’re processing XML documents that are provided by untrusted sources, then we need to be especially careful.

The most obvious attack here is XML External Entity (XXE) injection. This is where someone can craft an XML document that will maliciously load a resource the attacker shouldn’t have access to. For example:

<!DOCTYPE example [
    <!ENTITY windscreen SYSTEM "file:///etc/passwd">
]>
<part number="1976">
    <name>&windscreen;</name>
</part>

This XML document declares a custom entity that the contents of the system password file will replace. Obviously, this isn’t something that should be possible, but if we’re not careful, then an attacker could do exactly this.

Another potential attack is sometimes known as an XML Bomb. This is a DoS attack that uses the repetitive expansion of XML entities:

<!DOCTYPE test [
    <!ENTITY a0 "someLargeString">
    <!ENTITY a1 "&a0;&a0;&a0;&a0;&a0;&a0;&a0;&a0;&a0;&a0;">
    <!ENTITY a2 "&a1;&a1;&a1;&a1;&a1;&a1;&a1;&a1;&a1;&a1;">
    <!ENTITY a3 "&a2;&a2;&a2;&a2;&a2;&a2;&a2;&a2;&a2;&a2;">
    <!ENTITY a4 "&a3;&a3;&a3;&a3;&a3;&a3;&a3;&a3;&a3;&a3;">
]>
<document>&a4;</document>

Here, we have our “&a4;” entity. This expands to 10 instances of “&a3;”, each of which expands to 10 instances of “&a2;”, and so on. This results in our document including 10,000 instances of “someLargeString”. If our attacker went even further, we could get significantly more – going 10 levels deep would give us 10,000,000,000 instances, which would be 140 GB in size.

In general, the only way to avoid these risks is to disable custom entities entirely in the XML processor. However, this removes the benefits that are gained from them as well. If we’re processing XML documents from untrusted sources, then this risk is likely not worth the benefit, but for internal documents, it might be beneficial.

6. Conclusion

In this article, we’ve seen how we can use XML entities in our XML documents to allow us to represent special characters. We even learned how we can define our entities if necessary.

Orkes is the leading workflow orchestration platform built to enable teams to transform the way they develop, connect, and deploy applications, microservices, AI agents, and more.

Try a 14-Day Free Trial of Orkes Conductor today.

Orkes is the leading workflow orchestration platform built to enable teams to transform the way they develop, connect, and deploy applications, microservices, AI agents, and more.

Try a 14-Day Free Trial of Orkes Conductor today.

Modern Java teams move fast — but codebases don’t always keep up. Frameworks change, dependencies drift, and tech debt builds until it starts to drag on delivery. OpenRewrite was built to fix that: an open-source refactoring engine that automates repetitive code changes while keeping developer intent intact.

The monthly training series, led by the creators and maintainers of OpenRewrite at Moderne, walks through real-world migrations and modernization patterns. Whether you’re new to recipes or ready to write your own, you’ll learn practical ways to refactor safely and at scale.

If you’ve ever wished refactoring felt as natural — and as fast — as writing code, this is a good place to start.