Authors Top

If you have a few years of experience with the Kotlin language and server-side development, and you’re interested in sharing that experience with the community, have a look at our Contribution Guidelines.

1. Introduction

Kotlin is much more expressive and brief than Java, but does it have a cost? To be more precise, is there any performance penalty for choosing Kotlin over Java?

For the most part, Kotlin compiles into the same bytecode as Java. Classes, functions, function arguments, and standard flow control operators, such as if and for, work in the same way.

However, there are differences between Kotlin and Java. For example, Kotlin has inline functions. If such a function takes a lambda as an argument, there’s no actual lambda in the bytecode. Instead, the compiler rewrites the call site to call the instructions in the lambda when they are needed in the inlined function. All of the collection transformation functions – such as map, filter, associate, first, find any, and many others – are inline functions.

In addition to that, to use functional transformations on collections in Java, we have to create a Stream out of a collection and later collect that Stream with a Collector to the target collection. When we need to do a series of transformations over a large collection, this makes sense. However, when we only need to map our short collection once and get the result, the penalty of extra object creation is comparable to the useful payload.

In this article, we’ll look at how we can measure differences between Java and Kotlin code and also analyze how big this difference is.

2. Java Microbenchmark Harness

As Kotlin compiles to the same JVM bytecode as Java, we can use Java Microbenchmark Harness (JMH) to analyze the performance of both Java and Kotlin code. To set up the project, we’ll create a simple Gradle-based project and use a neat little plugin to connect to the JMH framework:

Let’s write some test cases. The numbers that we’ll see were acquired on an Apple M1 Pro laptop with 32GB of RAM running macOS Monterey 12.1 and OpenJDK 64-Bit Server VM, 17.0.1+12-39. The mileage may vary for other systems and software versions.

We’ll return some values from each of our test functions and put them into a Blackhole. This will prevent the JIT compiler from over-optimizing the code we run. We’ll use a reasonable number of warmup iterations in order to get properly averaged results:

@Benchmark
@BenchmarkMode(Mode.Throughput)
@Fork(value = 5, warmups = 5)
@OutputTimeUnit(TimeUnit.MILLISECONDS)

The results and those that will follow describe the throughput: the number of repetitions per unit of time – the higher the number, the faster the operation. The time unit will differ from test to test: We want neither too big a number of operations per unit of time nor too small a number. For the experiment, it is irrelevant, but it’s easier to understand and operate with numbers of reasonable size.

3. Inline Higher-Order Functions

The first case is inline higher-order functions. Our function will imitate executing an action under transaction: We’ll create a transaction object, open it, perform the action passed as an argument, and then commit the transaction.

The Java way of dealing with lambda involves the invokedynamic instruction in the bytecode. This instruction requires the JIT-compiler to create a call-site object that belongs to a generated class implementing a functional interface. All of this happens behind the scenes, and we just create a lambda:

public static <T> T inTransaction(JavaDatabaseTransaction.Database database, Function<JavaDatabaseTransaction.Database, T> action) {
    var transaction = new JavaDatabaseTransaction.Transaction(UUID.randomUUID());
    try {
        var result = action.apply(database);
        transaction.commit();
        return result;
    } catch (Exception ex) {
        transaction.rollback();
        throw ex;
    }
}

public static String transactedAction(Object obj) throws MalformedURLException {
    var database = new JavaDatabaseTransaction.Database(new URL("http://localhost"), "user:pass");
    return inTransaction(database, d -> UUID.randomUUID() + obj.toString());
}

The Kotlin way of doing this is very similar, though more concise. However, on the bytecode level, the story is completely different. The inTransaction method would just be copied to the call site, and there will be no lambda at all:

inline fun <T> inTransaction(db: Database, action: (Database) -> T): T {
    val transaction = Transaction(id = UUID.randomUUID())
    try {
        return action(db).also { transaction.commit() }
    } catch (ex: Exception) {
        transaction.rollback()
        throw ex
    }
}

fun transactedAction(arg: Any): String {
    val database = Database(URL("http://localhost"), "user:pass")
    return inTransaction(database) { UUID.randomUUID().toString() + arg.toString() }
}

In the real case of doing something with the database, the difference between the two approaches will be negligible, as the driving cost here will be the network IO. But let’s see if inlining provides much of a difference:

Benchmark                            Mode  Cnt     Score       Error   Units
KotlinVsJava.inlinedLambdaKotlin     thrpt  25     1433.691 ± 108.326  ops/ms
KotlinVsJava.lambdaJava              thrpt  25      993.428 ±  25.065  ops/ms

As it turns out, it does! Inlining is 44% more efficient than generating a dynamic call site at runtime, it seems. That means, for higher-order functions, especially on a critical path, we should consider inline.

4. Functional Collection Transformations

As we already stated, the Java Stream API requires one more object creation than the Kotlin Collections library functions. Let’s see if this is noticeable. We’ll put our model list of strings into the @State(Scope.Benchmark) container so that the JIT compiler won’t optimize away our repeated operations.

The Java implementation is quite short:

public static List<String> transformStringList(List<String> strings) {
    return strings.stream().map(s -> s + System.currentTimeMillis()).collect(Collectors.toList());
}

And the Kotlin one is even shorter:

fun transformStringList(strings: List<String>) =
    strings.map { it + System.currentTimeMillis() }

We’re using currentTimeMillis so that each of our strings will be different each time the function is called. That way we can tell the JIT-compiler not to optimize away all the code that we run.

The results of that run aren’t conclusive:

Benchmark                              Mode    Cnt   Score     Error    Units
KotlinVsJava.stringCollectionJava    thrpt   25    1982.486 ± 112.839 ops/ms
KotlinVsJava.stringCollectionKotlin  thrpt   25    1760.223 ± 69.072  ops/ms

It even looks like Java is 12% faster. This can be explained by the fact that Kotlin does additional implicit checks, like null-checks for non-nullable arguments, which become visible if we look at the bytecode:

  public final static transformStringList(Ljava/util/List;)Ljava/util/List;
  // skip irrelevant stuff
   L0
    ALOAD 0
    LDC "strings"
    INVOKESTATIC kotlin/jvm/internal/Intrinsics.checkNotNullParameter (Ljava/lang/Object;Ljava/lang/String;)V
...

There are also additional checks in the map function to ensure the correct ArrayList size. As we don’t do much else in our function, these small things start to show themselves.

The effect of an extra instantiation should be more noticeable when we deal with short collections and do cheap transformations. In that case, the creation of a Stream object will be noticeable:

fun mapSmallCollection() =
    (1..10).map { java.lang.String.valueOf(it) }

And for the Java version:

public static List<String> transformSmallList() {
    return IntStream.range(1, 10)
      .mapToObj(String::valueOf)
      .collect(Collectors.toList());
}

We can see that the difference went the other way:

Benchmark                              Mode    Cnt   Score     Error    Units
KotlinVsJava.smallCollectionJava     thrpt   25    15.135 ± 0.932     ops/us
KotlinVsJava.smallCollectionKotlin   thrpt   25    17.826 ± 0.332     ops/us

On the other hand, both of the operations are extremely fast and the difference is unlikely to be a significant factor in actual production code.

5. Variable Arguments With a Spread Operator

In Java, the variable arguments construct is just syntax sugar. Each such argument is actually an array. If we already have our data in an array, we can use it straight away as an argument:

public static String concatenate(String... pieces) {
    StringBuilder sb = new StringBuilder(pieces.length * 8);
    for(String p : pieces) {
        sb.append(p).append(",");
    }
    return sb.toString();
}

public static String callConcatenate(String[] pieces) {
    return concatenate(pieces);
}

In Kotlin, however, a variable argument is a special case. If our data are already in an array, we’ll have to spread that array:

fun concatenate(vararg pieces: String): String = pieces.joinToString()

fun callVarargFunction(pieces: Array<out String>) = concatenate(*pieces)

Let’s see if that means any penalty in performance:

Benchmark                              Mode    Cnt   Score     Error    Units
KotlinVsJava.varargsJava               thrpt    25    14.653 ± 0.089    ops/us
KotlinVsJava.varargsKotlin             thrpt    25    12.468 ± 0.279    ops/us

And indeed, Kotlin is penalized about 17% for spreading (which involves an array copy). This means that in performance-critical sections of our code, it is better to avoid using spreading to call a function with vararg arguments.

6. Changing a Java Bean vs. Copying a Data Class, Initialization Included

Kotlin introduces the concept of data classes and promotes heavy use of immutable val fields, as opposed to traditional Java fields, which are editable via setter methods. If we need to change a field in a data class, we have to use the copy method and substitute the changing field with the desired value:

fun changeField(input: DataClass, newValue: String): DataClass = input.copy(fieldA = newValue)

This leads to the instantiation of a new object. Let’s check if it takes significantly more time than using Java-style editable fields. For the experiment, we’ll create a minimal data class with just one field:

data class DataClass(val fieldA: String)

We’ll also use the simplest possible POJO:

public class POJO {
    private String fieldA;

    public POJO(String fieldA) {
        this.fieldA = fieldA;
    }

    public String getFieldA() {
        return fieldA;
    }

    public void setFieldA(String fieldA) {
        this.fieldA = fieldA;
    }
}

To prevent JIT from over-optimizing the test code, let’s put both the initial field value and the new value into a @State object:

@State(Scope.Benchmark)
public static class InputString {
    public String string1 = "ABC";
    public String string2 = "XYZ";
}

Now, let’s run the test, which creates an object, changes it and passes it to a Blackhole:

Benchmark                              Mode    Cnt   Score     Error    Units
KotlinVsJava.changeFieldJava           thrpt   25    337.300 ± 1.263     ops/us

KotlinVsJava.changeFieldKotlin         thrpt   25    351.128 ± 0.910     ops/us

The experiment shows that copying an object appears even 4% faster than changing its field. In fact, further study shows that with the growth of the number of fields the performance difference between the two approaches grows. Let’s take a more complex data class:

data class DataClass(
    val fieldA: String,
    val fieldB: String,
    val addressLine1: String,
    val addressLine2: String,
    val city: String,
    val age: Int,
    val salary: BigDecimal,
    val currency: Currency,
    val child: InnerDataClass
)

data class InnerDataClass(val fieldA: String, val fieldB: String)

And an analogous Java POJO. Then the results of a similar benchmark would stand thus:

Benchmark                        Mode   Cnt    Score     Error   Units
KotlinVsJava.changeFieldJava     thrpt   25    100,503 ± 1,047    ops/us
KotlinVsJava.changeFieldKotlin   thrpt   25    126,282 ± 0,232    ops/us

This might seem counterintuitive, but then let’s note that compared to a trivial data class with one field, this one performs almost three times slower overall. Then let’s remember that we put the initial construction into the benchmark, as well as the changing of a field. As was explored previously, final keyword sometimes has an effect on performance, and often it’s a small but positive one. Apparently, the constructor cost dominates this benchmark.

7. Changing a Java Bean vs. Copying a Data Class, Initialization Excluded

If we isolate copying for Kotlin and modifying for Java with a @State object:

@State(Scope.Thread)
public static class InputKotlin {
    public DataClass pojo;

    @Setup(Level.Trial)
    public void setup() {
        pojo = new DataClass(
                "ABC",
                "fieldB",
                "Baker st., 221b",
                "Marylebone",
                "London",
                (int) (31 + System.currentTimeMillis() % 17),
                new BigDecimal("30000.23"),
                Currency.getInstance("GBP"),
                new InnerDataClass("a", "b")
        );
    }

    public String string2 = "XYZ";
}

// Proper benchmark annotations
public void changeFieldKotlin_changingOnly(Blackhole blackhole, InputKotlin input) {
    blackhole.consume(DataClassKt.changeField(input.pojo, input.string2));
}

we’ll see another picture:

Benchmark                                     Mode    Cnt    Score     Error   Units
KotlinVsJava.changeFieldJava_changingOnly     thrpt   25     364,745 ± 2,470    ops/us
KotlinVsJava.changeFieldKotlin_changingOnly   thrpt   25     163,215 ± 1,235    ops/us

So it would seem that modifying is 2.23 times faster than copying. However, instantiating a mutable object is quite a bit slower: we are able to create an immutable object twice and still pull ahead of the mutable object constructor with a mutation method.

All in all, the immutable approach will definitely be slower. However, copying won’t be slower by orders of magnitude. Secondly, if we need to change more than one property, then we’ll pull ahead: for Koltin data class it’ll still be a single copy() call, while for a mutable object it’ll mean several setter calls. Thirdly, an instantiation of a mutable object is more costly than those of an immutable one, so depending on the overall code the final score might not be in favour of the POJO. And finally, all these costs are so small anyway, that they’ll be dominated in a real application by IO and business logic.

Therefore, we may conclude that using immutable structures and copying instead of using setters is not going to affect the performance of our program significantly.

8. Conclusion

In the article, we looked at how we can check our hypotheses about Kotlin performance in comparison to Java. Mostly, as expected, Kotlin’s performance is comparable to Java’s performance. There are small gains in some places, like inlining lambdas. Conversely, there are clear losses in others, like spreading an array in a vararg argument.

All things being fairly equal, it is clear that inlining functions, which take other functions or lambdas as parameters, is quite beneficial.

What is important, however, is that Kotlin offers pretty much the same runtime performance as Java. Its use won’t become a problem in production.

We also learned how to quickly and efficiently use the JMH framework to gauge the performance of our code. With a sensible performance test harness, we can predict problems in production before they begin. An important takeout here is that many things may affect a JMH test, and all benchmark results should be taken with a grain of salt.

As always, code examples can be found over on GitHub.

Authors Bottom

If you have a few years of experience with the Kotlin language and server-side development, and you’re interested in sharing that experience with the community, have a look at our Contribution Guidelines.

4 Comments
Oldest
Newest
Inline Feedbacks
View all comments
Comments are closed on this article!