Effect of Idempotence on the Performance of a Kafka Producer

Last updated: January 26, 2026

Written by: Aleksandar Pelanovic

Reviewed by: Saajan Nagendra

Data

JMH
Kafka

Refactor Java code safely — and automatically — with OpenRewrite.

Refactoring big codebases by hand is slow, risky, and easy to put off. That’s where OpenRewrite comes in. The open-source framework for large-scale, automated code transformations helps teams modernize safely and consistently.

Each month, the creators and maintainers of OpenRewrite at Moderne run live, hands-on training sessions — one for newcomers and one for experienced users. You’ll see how recipes work, how to apply them across projects, and how to modernize code with confidence.

Join the next session, bring your questions, and learn how to automate the kind of work that usually eats your sprint time.

Distributed systems often come with complex challenges such as service-to-service communication, state management, asynchronous messaging, security, and more.

Dapr (Distributed Application Runtime) provides a set of APIs and building blocks to address these challenges, abstracting away infrastructure so we can focus on business logic.

In this tutorial, we'll focus on Dapr's pub/sub API for message brokering. Using its Spring Boot integration, we'll simplify the creation of a loosely coupled, portable, and easily testable pub/sub messaging system:

>> Flexible Pub/Sub Messaging With Spring Boot and Dapr

1. Introduction

In this tutorial, we’ll explore the concept of idempotence, how it applies to Kafka producers, and its impact on performance.

After introducing the core concepts, we’ll set up a benchmark using the Java Microbenchmark Harness (JMH) to measure how idempotent producers impact performance in Kafka.

Lastly, we’ll analyze the benchmark results to determine when idempotence should be disabled and when it should remain enabled.

2. Idempotence Meaning in Kafka

Let’s begin by understanding what idempotence means. In general terms, idempotence means that performing the same operation multiple times produces the same result as if it were performed once.

In distributed systems, failures such as network issues, broker outages, and timeouts can cause Kafka producers to retry requests, potentially resulting in duplicate records. Luckily, idempotence in Kafka helps address these cases. When we enable idempotency for the Kafka producer, it guarantees that retrying a send operation doesn’t result in duplicate records being written into a topic.

This is implemented at the protocol level, without the need for deduplication on the application level. Each producer instance is assigned a unique identifier, and records are tracked using sequence numbers. In the event of the mentioned issues, this allows the broker to discard duplicate records that may be caused by retrying.

3. Idempotent Producer

As mentioned earlier, an idempotent producer uses a deduplication mechanism at the protocol level. The producer and broker will coordinate to ensure that retries don’t result in duplicate message delivery. That said, an idempotent producer aims to deliver a message exactly once.

In this section, we’ll see how to achieve that result.

3.1. Create an Idempotent Producer

Idempotent producers have been available in Kafka for some time, but they had to be explicitly enabled. To control whether we want the idempotency to be enabled or disabled in the producer, we’ll use a boolean property:

enable.idempotence=true

Since version 3.0, Kafka has enforced the strongest delivery guarantees, meaning it’s now enabled by default. However, idempotence is only enabled by default as long as no conflicting configuration is set. In other words, some producer settings can disable idempotence.

To make certain we have it enabled, we have to configure:

acks = all to ensure that we safely acknowledge messages
retries > 0 to allow retries (the default is Integer.MAX_VALUE)
max.in.flight.requests.per.connection <= 5 to keep message ordering intact

It should be noted that the Kafka Java client throws ConfigException if idempotence is explicitly enabled and conflicting configurations are set.

3.2. How Idempotence Changes Producer Behavior

When we enable idempotence, both the producer and the broker alter their message delivery behavior to ensure exactly-once semantics.

First, the broker assigns a unique Producer ID (PID) to each producer. Next, the producer attaches a sequence number to each batch of records it sends for a given partition. The broker uses the PID and sequence number to identify retries and drop duplicate records rather than writing them again to the topic.

After we set acks=all, the producer waits until the broker confirms that the record has been written to all replicas of that partition before sending additional records. In cases where the producer doesn’t receive an acknowledgment, it assumes the send operation failed and retries automatically. Retry operations will maintain the same PID and sequence number.

At the same time, this mechanism also introduces strict order requirements. We have to keep max.in.flight.requests.per.connection <= 5. Otherwise, if we allow too many concurrent requests, we could break sequence number validation.

Since sequence numbers must increase and idempotent producers must wait for acknowledgment that the broker wrote the record to all replicas of the partition, we’ll face some performance degradation. As a result of this new behavior, we achieve correctness, but at the expense of reduced parallelism and performance.

It’s also worth noting that deduplication works only while the producer is active. After a restart, the producer loses its previous sequence numbers, so Kafka can no longer guarantee that records won’t be duplicated. This behavior can be addressed by transactions, which help to preserve the producer’s state.

4. Measuring the Performance Impact

Now that we understand how idempotent producers work, let’s see it in practice. In the following sections, we’ll configure the benchmark to measure the performance of an idempotent versus a non-idempotent producer.

Our goal is to see how idempotence affects the throughput. To improve the reliability of results, we’ll run the same benchmark in two environments. First, we’ll use a single broker without replication. Then, we repeat a benchmark on a cluster with three brokers and a replication factor of three.

4.1. Benchmark Design

To create a stable environment, we use Docker Compose to run a Kafka cluster. To simplify configuration, each broker runs both the broker and controller and exposes an external listener that the benchmark can connect to:

services:
  kafka-1:
    image: apache/kafka:3.9.0
    container_name: kafka-1
    ports:
      - "29092:9092"
    environment:
      KAFKA_NODE_ID: 1
      KAFKA_PROCESS_ROLES: "broker,controller"
      KAFKA_LISTENERS: "PLAINTEXT://:19092,CONTROLLER://:19093,EXTERNAL://:9092"
      KAFKA_ADVERTISED_LISTENERS: "PLAINTEXT://kafka-1:19092,EXTERNAL://localhost:29092"
      KAFKA_MIN_INSYNC_REPLICAS: 2
// omitted irrelevant config

We repeat the same configuration for the remaining brokers, changing only the node ID and port. For the single broker scenario, we use the same setup but run only one broker and create topics with a replication factor of one.

Regarding the producer setup, both configurations are identical except for the idempotence flag, which is parameterized in the JMH benchmark. Initially, we set required properties acks=all, infinite retries, and a safe limit for the number of requests. Subsequently, we configure explicit batching and timeouts to reduce variability between runs, making our benchmark more stable and trustworthy:

props.put(ProducerConfig.ACKS_CONFIG, "all");
props.put(ProducerConfig.RETRIES_CONFIG, Integer.toString(Integer.MAX_VALUE));
props.put(ProducerConfig.MAX_IN_FLIGHT_REQUESTS_PER_CONNECTION, "5");

props.put(ProducerConfig.ENABLE_IDEMPOTENCE_CONFIG, String.valueOf(idempotent));

props.put(ProducerConfig.LINGER_MS_CONFIG, "5");
props.put(ProducerConfig.BATCH_SIZE_CONFIG, Integer.toString(32 * 1024));
props.put(ProducerConfig.COMPRESSION_TYPE_CONFIG, "none");
props.put(ProducerConfig.REQUEST_TIMEOUT_MS_CONFIG, "30000");
props.put(ProducerConfig.DELIVERY_TIMEOUT_MS_CONFIG, "120000");

Finally, the benchmark method sends records asynchronously, collects futures, and then waits for each one of them to complete:

@Benchmark
@OperationsPerInvocation(MESSAGES)
public void sendMessages() throws Exception {
    Future<RecordMetadata>[] futures = new Future[MESSAGES];
    for (int i = 0; i < MESSAGES; i++) {
        long key = counter++;
        futures[i] = producer.send(new ProducerRecord<>(topic, key, value));
    }

    for (Future<RecordMetadata> f : futures) {
        f.get();
    }
}

The reason for this approach is measuring how fast the producer sends requests, the broker persists them across partitions and replicas, and then acknowledges that the events are delivered. That way, we measure real end-to-end producer throughput while allowing Kafka to operate efficiently.

4.2. Interpreting the Results

Starting with a simpler scenario, using a single broker with no replication, we get the following results:

Benchmark                          (idempotent)   Mode  Cnt      Score     Error  Units
IdempotenceBenchmark.sendMessages          true  thrpt   10  24891.897 ± 263.271  ops/s
IdempotenceBenchmark.sendMessages         false  thrpt   10  24953.439 ± 313.723  ops/s

The score represents the throughput, measured in records per second. In other words, it indicates the number of records the producer successfully sends in one second. The difference between the idempotent and non-idempotent producers is small relative to the overall throughput. Some variation is expected in this type of benchmark.

Next, we run the benchmark on a three-broker cluster with a replication factor of three:

Benchmark                          (idempotent)   Mode  Cnt      Score     Error  Units
IdempotenceBenchmark.sendMessages          true  thrpt   10  23689.246 ± 568.638  ops/s
IdempotenceBenchmark.sendMessages         false  thrpt   10  24097.398 ± 500.293  ops/s

The throughput values for both producers remain very close. This level of variation is expected in performance benchmarks, and in some runs, we can even observe the idempotent producer achieving a slightly higher score. At this throughput level, Kafka can process roughly 24000 records per second. Under such conditions, small differences in throughput are almost negligible.

With idempotence enabled, we don’t measure a significant throughput drop, even when using multiple brokers and replicas. These differences can vary from run to run, but the results stay consistent in both scenarios.

5. When to Enable Idempotent Producers

Based on Kafka’s delivery guarantees and our benchmark results, we can see that idempotence doesn’t significantly reduce throughput, but it provides a strong deduplication mechanism. Kafka aims to deliver every record reliably, so idempotence is enabled by default in newer versions.

It’s safe to keep idempotence enabled when avoiding duplicate records and when we want strong delivery guarantees without implementing custom application-level deduplication.

On the other hand, if our primary goal is to have an absolute maximum throughput and duplicates are acceptable, we can disable idempotence. Even then, the performance difference is usually small while the risk of duplicates increases.

6. Conclusion

In this article, we explored idempotence in Kafka and measured its impact on performance using a JMH benchmark.

The results show that idempotence doesn’t cause a significant throughput drop. The differences we observed were minor and within normal measurement variability.

As always, complete code examples are available over on GitHub.