Compact Strings in Java 9

Azure Spring Apps is a fully managed service from Microsoft (built in collaboration with VMware), focused on building and deploying Spring Boot applications on Azure Cloud without worrying about Kubernetes.

And, the Enterprise plan comes with some interesting features, such as commercial Spring runtime support, a 99.95% SLA and some deep discounts (up to 47%) when you are ready for production.

>> Learn more and deploy your first Spring Boot app to Azure.

You can also ask questions and leave feedback on the Azure Spring Apps GitHub page.

Slow MySQL query performance is all too common. Of course it is. A good way to go is, naturally, a dedicated profiler that actually understands the ins and outs of MySQL.

The Jet Profiler was built for MySQL only, so it can do things like real-time query performance, focus on most used tables or most frequent queries, quickly identify performance issues and basically help you optimize your queries.

Critically, it has very minimal impact on your server's performance, with most of the profiling work done separately - so it needs no server changes, agents or separate services.

Basically, you install the desktop application, connect to your MySQL server, hit the record button, and you'll have results within minutes:

>> Try out the Profiler

Accelerate Your Jakarta EE Development with Payara Server!

With best-in-class guides and documentation, Payara essentially simplifies deployment to diverse infrastructures.

Beyond that, it provides intelligent insights and actions to optimize Jakarta EE applications.

The goal is to apply an opinionated approach to get to what's essential for mission-critical applications - really solid scalability, availability, security, and long-term support:

>> Download and Explore the Guide (to learn more)

The AI Assistant to boost Boost your productivity writing unit tests - Machinet AI.

AI is all the rage these days, but for very good reason. The highly practical coding companion, you'll get the power of AI-assisted coding and automated unit test generation.
Machinet's Unit Test AI Agent utilizes your own project context to create meaningful unit tests that intelligently aligns with the behavior of the code.
And, the AI Chat crafts code and fixes errors with ease, like a helpful sidekick.

Simplify Your Coding Journey with Machinet AI:

>> Install Machinet AI in your IntelliJ

Looking for the ideal Linux distro for running modern Spring apps in the cloud?

Meet Alpaquita Linux: lightweight, secure, and powerful enough to handle heavy workloads.

This distro is specifically designed for running Java apps. It builds upon Alpine and features significant enhancements to excel in high-density container environments while meeting enterprise-grade security standards.

Specifically, the container image size is ~30% smaller than standard options, and it consumes up to 30% less RAM:

>> Try Alpaquita Containers now.

DbSchema is a super-flexible database designer, which can take you from designing the DB with your team all the way to safely deploying the schema.

The way it does all of that is by using a design model, a database-independent image of the schema, which can be shared in a team using GIT and compared or deployed on to any database.

And, of course, it can be heavily visual, allowing you to interact with the database using diagrams, visually compose queries, explore the data, generate random data, import data or build HTML5 database reports.

>> Take a look at DBSchema

Slow MySQL query performance is all too common. Of course it is. A good way to go is, naturally, a dedicated profiler that actually understands the ins and outs of MySQL.

Critically, it has very minimal impact on your server's performance, with most of the profiling work done separately - so it needs no server changes, agents or separate services.

Basically, you install the desktop application, connect to your MySQL server, hit the record button, and you'll have results within minutes:

>> Try out the Profiler

1. Overview

Strings in Java are internally represented by a char[] containing the characters of the String. And, every char is made up of 2 bytes because Java internally uses UTF-16.

For instance, if a String contains a word in the English language, the leading 8 bits will all be 0 for every char, as an ASCII character can be represented using a single byte.

Many characters require 16 bits to represent them but statistically most require only 8 bits — LATIN-1 character representation. So, there is a scope to improve the memory consumption and performance.

What’s also important is that Strings typically usually occupy a large proportion of the JVM heap space. And, because of the way they’re stored by the JVM, in most cases, a String instance can take up double space it actually needs.

In this article, we’ll discuss the Compressed String option, introduced in JDK6 and the new Compact String, recently introduced with JDK9. Both of these were designed to optimize memory consumption of Strings on the JMV.

**2. Compressed String – Java 6**

The JDK 6 update 21 Performance Release, introduced a new VM option:

-XX:+UseCompressedStrings

When this option is enabled, Strings are stored as byte[], instead of char[] – thus, saving a lot of memory. However, this option was eventually removed in JDK 7, mainly because it had some unintended performance consequences.

**3. Compact String – Java 9**

Java 9 has brought the concept of compact Strings back.

This means that whenever we create a String if all the characters of the String can be represented using a byte — LATIN-1 representation, a byte array will be used internally, such that one byte is given for one character.

In other cases, if any character requires more than 8-bits to represent it, all the characters are stored using two bytes for each — UTF-16 representation.

So basically, whenever possible, it’ll just use a single byte for each character.

Now, the question is – how will all the String operations work? How will it distinguish between the LATIN-1 and UTF-16 representations?

Well, to tackle this issue, another change is made to the internal implementation of the String. We have a final field coder, that preserves this information.

**3.1. String Implementation in Java 9**

Until now, the String was stored as a char[]:

private final char[] value;

From now on, it’ll be a byte[]:

private final byte[] value;

The variable coder:

private final byte coder;

Where the coder can be:

static final byte LATIN1 = 0;
static final byte UTF16 = 1;

Most of the String operations now check the coder and dispatch to the specific implementation:

public int indexOf(int ch, int fromIndex) {
    return isLatin1() 
      ? StringLatin1.indexOf(value, ch, fromIndex) 
      : StringUTF16.indexOf(value, ch, fromIndex);
}  

private boolean isLatin1() {
    return COMPACT_STRINGS && coder == LATIN1;
}

With all the info the JVM needs ready and available, the CompactString VM option is enabled by default. To disable it, we can use:

+XX:-CompactStrings

**3.2. How coder Works**

In Java 9 String class implementation, the length is calculated as:

public int length() {
    return value.length >> coder;
}

If the String contains only LATIN-1, the value of the coder will be 0 so the length of the String will be the same as the length of the byte array.

In other cases, if the String is in UTF-16 representation, the value of coder will be 1, and hence the length will be half the size of the actual byte array.

Note that all the changes made for Compact String, are in the internal implementation of the String class and are fully transparent for developers using String.

4. Compact Strings vs. Compressed Strings

In case of JDK 6 Compressed Strings, a major problem faced was that the String constructor accepted only char[] as an argument. In addition to this, many String operations depended on char[] representation and not a byte array. Due to this, a lot of unpacking had to be done, which affected the performance.

Whereas in case of Compact String, maintaining the extra field “coder” can also increase the overhead. To mitigate the cost of the coder and the unpacking of bytes to chars (in case of UTF-16 representation), some of the methods are intrinsified and the ASM code generated by the JIT compiler has also been improved.

This change resulted in some counter-intuitive results. The LATIN-1 indexOf(String) calls an intrinsic method, whereas the indexOf(char) does not. In case of UTF-16, both of these methods call an intrinsic method. This issue affects only the LATIN-1 String and will be fixed in future releases.

Thus, Compact Strings are better than the Compressed Strings in terms of performance.

To find out how much memory is saved using the Compact Strings, various Java application heap dumps were analyzed. And, while results were heavily dependent on the specific applications, the overall improvements were almost always considerable.

4.1. Difference in Performance

Let’s see a very simple example of the performance difference between enabling and disabling Compact Strings:

long startTime = System.currentTimeMillis();
 
List strings = IntStream.rangeClosed(1, 10_000_000)
  .mapToObj(Integer::toString) 
  .collect(toList());
 
long totalTime = System.currentTimeMillis() - startTime;
System.out.println(
  "Generated " + strings.size() + " strings in " + totalTime + " ms.");

startTime = System.currentTimeMillis();
 
String appended = (String) strings.stream()
  .limit(100_000)
  .reduce("", (l, r) -> l.toString() + r.toString());
 
totalTime = System.currentTimeMillis() - startTime;
System.out.println("Created string of length " + appended.length() 
  + " in " + totalTime + " ms.");

Here, we are creating 10 million Strings and then appending them in a naive manner. When we run this code (Compact Strings are enabled by default), we get the output:

Generated 10000000 strings in 854 ms.
Created string of length 488895 in 5130 ms.

Similarly, if we run it by disabling the Compact Strings using: -XX:-CompactStrings option, the output is:

Generated 10000000 strings in 936 ms.
Created string of length 488895 in 9727 ms.

Clearly, this is a surface level test, and it can’t be highly representative – it’s only a snapshot of what the new option may do to improve performance in this particular scenario.

5. Conclusion

In this tutorial, we saw the attempts to optimize the performance and memory consumption on the JVM – by storing Strings in a memory efficient way.

As always, the entire code is available over on Github.

Compact Strings in Java 9

Get started with Spring and Spring Boot, through the Learn Spring course:

1. Overview

**2. Compressed String – Java 6**

**3. Compact String – Java 9**

**3.1. String Implementation in Java 9**

**3.2. How coder Works**

4. Compact Strings vs. Compressed Strings

4.1. Difference in Performance

5. Conclusion

Get started with Spring and Spring Boot, through the Learn Spring course:

REST with Spring

Learn Spring Security ▼▲

Learn Spring Security Core

Learn Spring Security OAuth

Learn Spring

Learn Spring Data JPA

Persistence

REST

Security

Full Archive

Baeldung Ebooks

About Baeldung

Write for Baeldung

Get started with Spring and Spring Boot, through the Learn Spring course:

1. Overview

2. Compressed String – Java 6

3. Compact String – Java 9

3.1. String Implementation in Java 9

3.2. How coder Works

4. Compact Strings vs. Compressed Strings

4.1. Difference in Performance

5. Conclusion

Get started with Spring and Spring Boot, through the Learn Spring course:

**2. Compressed String – Java 6**

**3. Compact String – Java 9**

**3.1. String Implementation in Java 9**

**3.2. How coder Works**