Java Top

I just announced the new Learn Spring course, focused on the fundamentals of Spring 5 and Spring Boot 2:

>> CHECK OUT THE COURSE

1. Overview

Compilers and runtimes tend to optimize everything, even the smallest and seemingly less critical parts. When it comes to these sorts of optimizations, JVM and Java have a lot to offer.

In this article, we're going to evaluate one of these relatively new optimizations: string concatenation with invokedynamic.

2. Before Java 9

Before Java 9, non-trivial string concatenations were implemented using StringBuilder. For instance, let's consider the following method:

String concat(String s, int i) {
    return s + i;
}

The bytecode for this simple code is as follows (with javap -c):

java.lang.String concat(java.lang.String, int);
  Code:
     0: new           #2      // class StringBuilder
     3: dup
     4: invokespecial #3      // Method StringBuilder."<init>":()V
     7: aload_0
     8: invokevirtual #4      // Method StringBuilder.append:(LString;)LStringBuilder;
    11: iload_1
    12: invokevirtual #5      // Method StringBuilder.append:(I)LStringBuilder;
    15: invokevirtual #6      // Method StringBuilder.toString:()LString;

Here, the Java 8 compiler is using StringBuilder to concatenate the method inputs, even though we didn't use StringBuilder in our code.

To be fair, concatenating strings using StringBuilder is pretty efficient and well-engineered.

Let's see how Java 9 changes this implementation and what are the motivations for such a change.

3. Invoke Dynamic

As of Java 9 and as part of JEP 280, the string concatenation is now using invokedynamic.

The primary motivation behind the change is to have a more dynamic implementation. That is, it's possible to change the concatenation strategy without changing the bytecode. This way, clients can benefit from a new optimized strategy even without recompilation.

There are other advantages, too. For example, the bytecode for invokedynamic is more elegant, less brittle, and smaller.

3.1. Big Picture

Before diving into details of how this new approach works, let's see it from a broader point of view.

As an example, suppose we're going to create a new String by joining another String with an int. We can think of this as a function that accepts a String and an int and then returns the concatenated String.

Here's how the new approach works for this example:

  • Preparing the function signature describing the concatenation. For instance, (String, int) -> String
  • Preparing the actual arguments for the concatenation. For instance, if we're going to join “The answer is “ and 42, then these values will be the arguments
  • Calling the bootstrap method and passing the function signature, the arguments, and a few other parameters to it
  • Generating the actual implementation for that function signature and encapsulating it inside a MethodHandle
  • Calling the generated function to create the final joined string
Indy Concat

Put simply, the bytecode defines a specification at compile-time. Then the bootstrap method links an implementation to that specification at runtime. This, in turn, will make it possible to change the implementation without touching the bytecode.

Throughout this article, we'll uncover the details associated with each of these steps.

First, let's see how the linkage to the bootstrap method works.

4. The Linkage

Let's see how the Java 9+ compiler generates the bytecode for the same method:

java.lang.String concat(java.lang.String, int);
  Code:
     0: aload_0
     1: iload_1
     2: invokedynamic #7,  0   // InvokeDynamic #0:makeConcatWithConstants:(LString;I)LString;
     7: areturn

As opposed to the naive StringBuilder approach, this one is using a significantly smaller number of instructions.

In this bytecode, the (LString;I)LString signature is quite interesting. It takes a String and an int (the I represents int) and returns the concatenated string. This is because the method joins one String and an int together.

Similar to other invoke dynamic implementations, much of the logic is moved out from compile-time to runtime.

To see that runtime logic, let's inspect the bootstrap method table (with javap -c -v):

BootstrapMethods:
  0: #25 REF_invokeStatic java/lang/invoke/StringConcatFactory.makeConcatWithConstants:
    (Ljava/lang/invoke/MethodHandles$Lookup;
     Ljava/lang/String;
     Ljava/lang/invoke/MethodType;
     Ljava/lang/String;
     [Ljava/lang/Object;)Ljava/lang/invoke/CallSite;
    Method arguments:
      #31 \u0001\u0001

In this case, when the JVM sees the invokedynamic instruction for the first time, it calls the makeConcatWithConstants bootstrap method. The bootstrap method will, in turn, return a ConstantCallSite, which points to the concatenation logic.

Indy

Among the arguments passed to the bootstrap method, two stand out:

  • Ljava/lang/invoke/MethodType represents the string concatenation signature. In this case, it's (LString;I)LString since we're combining an integer with a String
  • \u0001\u0001 is the recipe for constructing the string (more on this later)

5. Recipes

To better understand the role of recipes, let's consider a simple data class:

public class Person {

    private String firstName;
    private String lastName;

    // constructor

    @Override
    public String toString() {
        return "Person{" +
          "firstName='" + firstName + '\'' +
          ", lastName='" + lastName + '\'' +
          '}';
    }
}

To generate a String representation, the JVM passes firstName and lastName fields to the invokedynamic instruction as the arguments:

 0: aload_0
 1: getfield      #7        // Field firstName:LString;
 4: aload_0
 5: getfield      #13       // Field lastName:LString;
 8: invokedynamic #16,  0   // InvokeDynamic #0:makeConcatWithConstants:(LString;LString;)L/String;
 13: areturn

This time, the bootstrap method table looks a bit different:

BootstrapMethods:
  0: #28 REF_invokeStatic StringConcatFactory.makeConcatWithConstants // truncated
    Method arguments:
      #34 Person{firstName=\'\u0001\', lastName=\'\u0001\'} // The recipe

As shown above, the recipe represents the basic structure of the concatenated String. For instance, the preceding recipe consists of:

  • Constant strings such as “Person. These literal values will be present in the concatenated string as-is
  • Two \u0001 tags to represent ordinary arguments. They will be replaced by the actual arguments such as firstName

We can think of the recipe as a templated String containing both static parts and variable placeholders.

Using recipes can dramatically reduce the number of arguments passed to the bootstrap method, as we only need to pass all dynamic arguments plus one recipe.

6. Bytecode Flavors

There are two bytecode flavors for the new concatenation approach. So far, we're familiar with the one flavor: calling the makeConcatWithConstants bootstrap method and passing a recipe. This flavor, known as indy with constants, is the default one as of Java 9.

Instead of using a recipe, the second flavor passes everything as arguments. That is, it doesn't differentiate between constant and dynamic parts and passes all of them as arguments.

To use the second flavor, we should pass the -XDstringConcat=indy option to the Java compiler. For instance, if we compile the same Person class with this flag, then the compiler generates the following bytecode:

public java.lang.String toString();
    Code:
       0: ldc           #16      // String Person{firstName=\'
       2: aload_0
       3: getfield      #7       // Field firstName:LString;
       6: bipush        39
       8: ldc           #18      // String , lastName=\'
      10: aload_0
      11: getfield      #13      // Field lastName:LString;
      14: bipush        39
      16: bipush        125
      18: invokedynamic #20,  0  // InvokeDynamic #0:makeConcat:(LString;LString;CLString;LString;CC)LString;
      23: areturn

This time around, the bootstrap method is makeConcat. Moreover, the concatenation signature takes seven arguments. Each argument represents one part from toString:

  • The first argument represents the part before the firstName variable — the “Person{firstName=\'” literal
  • The second argument is the value of the firstName field
  • The third argument is a single quotation character
  • The fourth argument is the part before the next variable — “, lastName=\'”
  • The fifth argument is the lastName field
  • The sixth argument is a single quotation character
  • The last argument is the closing curly bracket

This way, the bootstrap method has enough information to link an appropriate concatenation logic.

Quite interestingly, it's also possible to travel back to the pre-Java 9 world and use StringBuilder with the -XDstringConcat=inline compiler option.

7. Strategies

The bootstrap method eventually provides a MethodHandle that points to the actual concatenation logic. As of this writing, there are six different strategies to generate this logic:

  • BC_SB or “bytecode StringBuilder” strategy generates the same StringBuilder bytecode at runtime. Then it loads the generated bytecode via the Unsafe.defineAnonymousClass method
  • BC_SB_SIZED strategy will try to guess the necessary capacity for StringBuilder. Other than that, it's identical to the previous approach. Guessing the capacity can potentially help the StringBuilder to perform the concatenation without resizing the underlying byte[]
  • BC_SB_SIZED_EXACT is a bytecode generator based on StringBuilder that computes the required storage exactly. To calculate the exact size, first, it converts all arguments to String
  • MH_SB_SIZED is based on MethodHandles and eventually calls the StringBuilder API for concatenation. This strategy also makes an educated guess about the required capacity
  • MH_SB_SIZED_EXACT is similar to the previous one except it calculates the necessary capacity with complete accuracy
  • MH_INLINE_SIZE_EXACT calculates the required storage upfront and directly maintains its byte[] to store the concatenation result. This strategy is inline because it replicates what StringBuilder does internally

The default strategy is MH_INLINE_SIZE_EXACT. However, we can change this strategy using the -Djava.lang.invoke.stringConcat=<strategyName> system property. 

8. Conclusion

In this detailed article, we looked at how the new String concatenation is implemented and the advantages of using such an approach.

For an even more detailed discussion, it's a good idea to check out the experimental notes or even the source code.

Java bottom

I just announced the new Learn Spring course, focused on the fundamentals of Spring 5 and Spring Boot 2:

>> CHECK OUT THE COURSE
3 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
Leonel
Leonel
3 months ago

The second paragraph is misleading!   It says, “Here, the Java 8 compiler is using StringBuilder to concatenate the strings, even though we didn’t use StringBuilder in our code.” and “To be fair, concatenating strings using StringBuilder is pretty efficient and well-engineered.”   But it omits that the generated bytecode instantiates a new StringBuilder object at each step of the loop. In Java 8, this new StringBuilder instance will make a copy of the previous string, so each portion of the String is copied over and over.   That’s why I insist the paragraph is misleading. It may mislead the reader into thinking the compiler removes… Read more »

Loredana Crusoveanu
3 months ago
Reply to  Leonel

Hi Leonel,
Your observation that the bytecode instantiates a new StringBuilder instance at each loop iteration is correct.
However, it’s true that it’s more effecient than using a String for concatenation.
The article points out the compiler optimization that is being used in Java 8.

Ali Dehghani
3 months ago
Reply to  Leonel

Hello Leonel, Thanks for your detailed and to the point comment.
You’re absolutely right. That example was indeed confusing. The article will be updated very soon with a new example to avoid the confusion.
 
Again, thanks for pointing out the problem.
Cheers!

Comments are closed on this article!