Let's get started with a Microservice Architecture with Spring Cloud:
Effect of Idempotence on the Performance of a Kafka Producer
Last updated: January 26, 2026
1. Introduction
In this tutorial, we’ll explore the concept of idempotence, how it applies to Kafka producers, and its impact on performance.
After introducing the core concepts, we’ll set up a benchmark using the Java Microbenchmark Harness (JMH) to measure how idempotent producers impact performance in Kafka.
Lastly, we’ll analyze the benchmark results to determine when idempotence should be disabled and when it should remain enabled.
2. Idempotence Meaning in Kafka
Let’s begin by understanding what idempotence means. In general terms, idempotence means that performing the same operation multiple times produces the same result as if it were performed once.
In distributed systems, failures such as network issues, broker outages, and timeouts can cause Kafka producers to retry requests, potentially resulting in duplicate records. Luckily, idempotence in Kafka helps address these cases. When we enable idempotency for the Kafka producer, it guarantees that retrying a send operation doesn’t result in duplicate records being written into a topic.
This is implemented at the protocol level, without the need for deduplication on the application level. Each producer instance is assigned a unique identifier, and records are tracked using sequence numbers. In the event of the mentioned issues, this allows the broker to discard duplicate records that may be caused by retrying.
3. Idempotent Producer
As mentioned earlier, an idempotent producer uses a deduplication mechanism at the protocol level. The producer and broker will coordinate to ensure that retries don’t result in duplicate message delivery. That said, an idempotent producer aims to deliver a message exactly once.
In this section, we’ll see how to achieve that result.
3.1. Create an Idempotent Producer
Idempotent producers have been available in Kafka for some time, but they had to be explicitly enabled. To control whether we want the idempotency to be enabled or disabled in the producer, we’ll use a boolean property:
enable.idempotence=true
Since version 3.0, Kafka has enforced the strongest delivery guarantees, meaning it’s now enabled by default. However, idempotence is only enabled by default as long as no conflicting configuration is set. In other words, some producer settings can disable idempotence.
To make certain we have it enabled, we have to configure:
- acks = all to ensure that we safely acknowledge messages
- retries > 0 to allow retries (the default is Integer.MAX_VALUE)
- max.in.flight.requests.per.connection <= 5 to keep message ordering intact
It should be noted that the Kafka Java client throws ConfigException if idempotence is explicitly enabled and conflicting configurations are set.
3.2. How Idempotence Changes Producer Behavior
When we enable idempotence, both the producer and the broker alter their message delivery behavior to ensure exactly-once semantics.
First, the broker assigns a unique Producer ID (PID) to each producer. Next, the producer attaches a sequence number to each batch of records it sends for a given partition. The broker uses the PID and sequence number to identify retries and drop duplicate records rather than writing them again to the topic.
After we set acks=all, the producer waits until the broker confirms that the record has been written to all replicas of that partition before sending additional records. In cases where the producer doesn’t receive an acknowledgment, it assumes the send operation failed and retries automatically. Retry operations will maintain the same PID and sequence number.
At the same time, this mechanism also introduces strict order requirements. We have to keep max.in.flight.requests.per.connection <= 5. Otherwise, if we allow too many concurrent requests, we could break sequence number validation.
Since sequence numbers must increase and idempotent producers must wait for acknowledgment that the broker wrote the record to all replicas of the partition, we’ll face some performance degradation. As a result of this new behavior, we achieve correctness, but at the expense of reduced parallelism and performance.
It’s also worth noting that deduplication works only while the producer is active. After a restart, the producer loses its previous sequence numbers, so Kafka can no longer guarantee that records won’t be duplicated. This behavior can be addressed by transactions, which help to preserve the producer’s state.
4. Measuring the Performance Impact
Now that we understand how idempotent producers work, let’s see it in practice. In the following sections, we’ll configure the benchmark to measure the performance of an idempotent versus a non-idempotent producer.
Our goal is to see how idempotence affects the throughput. To improve the reliability of results, we’ll run the same benchmark in two environments. First, we’ll use a single broker without replication. Then, we repeat a benchmark on a cluster with three brokers and a replication factor of three.
4.1. Benchmark Design
To create a stable environment, we use Docker Compose to run a Kafka cluster. To simplify configuration, each broker runs both the broker and controller and exposes an external listener that the benchmark can connect to:
services:
kafka-1:
image: apache/kafka:3.9.0
container_name: kafka-1
ports:
- "29092:9092"
environment:
KAFKA_NODE_ID: 1
KAFKA_PROCESS_ROLES: "broker,controller"
KAFKA_LISTENERS: "PLAINTEXT://:19092,CONTROLLER://:19093,EXTERNAL://:9092"
KAFKA_ADVERTISED_LISTENERS: "PLAINTEXT://kafka-1:19092,EXTERNAL://localhost:29092"
KAFKA_MIN_INSYNC_REPLICAS: 2
// omitted irrelevant config
We repeat the same configuration for the remaining brokers, changing only the node ID and port. For the single broker scenario, we use the same setup but run only one broker and create topics with a replication factor of one.
Regarding the producer setup, both configurations are identical except for the idempotence flag, which is parameterized in the JMH benchmark. Initially, we set required properties acks=all, infinite retries, and a safe limit for the number of requests. Subsequently, we configure explicit batching and timeouts to reduce variability between runs, making our benchmark more stable and trustworthy:
props.put(ProducerConfig.ACKS_CONFIG, "all");
props.put(ProducerConfig.RETRIES_CONFIG, Integer.toString(Integer.MAX_VALUE));
props.put(ProducerConfig.MAX_IN_FLIGHT_REQUESTS_PER_CONNECTION, "5");
props.put(ProducerConfig.ENABLE_IDEMPOTENCE_CONFIG, String.valueOf(idempotent));
props.put(ProducerConfig.LINGER_MS_CONFIG, "5");
props.put(ProducerConfig.BATCH_SIZE_CONFIG, Integer.toString(32 * 1024));
props.put(ProducerConfig.COMPRESSION_TYPE_CONFIG, "none");
props.put(ProducerConfig.REQUEST_TIMEOUT_MS_CONFIG, "30000");
props.put(ProducerConfig.DELIVERY_TIMEOUT_MS_CONFIG, "120000");
Finally, the benchmark method sends records asynchronously, collects futures, and then waits for each one of them to complete:
@Benchmark
@OperationsPerInvocation(MESSAGES)
public void sendMessages() throws Exception {
Future<RecordMetadata>[] futures = new Future[MESSAGES];
for (int i = 0; i < MESSAGES; i++) {
long key = counter++;
futures[i] = producer.send(new ProducerRecord<>(topic, key, value));
}
for (Future<RecordMetadata> f : futures) {
f.get();
}
}
The reason for this approach is measuring how fast the producer sends requests, the broker persists them across partitions and replicas, and then acknowledges that the events are delivered. That way, we measure real end-to-end producer throughput while allowing Kafka to operate efficiently.
4.2. Interpreting the Results
Starting with a simpler scenario, using a single broker with no replication, we get the following results:
Benchmark (idempotent) Mode Cnt Score Error Units
IdempotenceBenchmark.sendMessages true thrpt 10 24891.897 ± 263.271 ops/s
IdempotenceBenchmark.sendMessages false thrpt 10 24953.439 ± 313.723 ops/s
The score represents the throughput, measured in records per second. In other words, it indicates the number of records the producer successfully sends in one second. The difference between the idempotent and non-idempotent producers is small relative to the overall throughput. Some variation is expected in this type of benchmark.
Next, we run the benchmark on a three-broker cluster with a replication factor of three:
Benchmark (idempotent) Mode Cnt Score Error Units
IdempotenceBenchmark.sendMessages true thrpt 10 23689.246 ± 568.638 ops/s
IdempotenceBenchmark.sendMessages false thrpt 10 24097.398 ± 500.293 ops/s
The throughput values for both producers remain very close. This level of variation is expected in performance benchmarks, and in some runs, we can even observe the idempotent producer achieving a slightly higher score. At this throughput level, Kafka can process roughly 24000 records per second. Under such conditions, small differences in throughput are almost negligible.
With idempotence enabled, we don’t measure a significant throughput drop, even when using multiple brokers and replicas. These differences can vary from run to run, but the results stay consistent in both scenarios.
5. When to Enable Idempotent Producers
Based on Kafka’s delivery guarantees and our benchmark results, we can see that idempotence doesn’t significantly reduce throughput, but it provides a strong deduplication mechanism. Kafka aims to deliver every record reliably, so idempotence is enabled by default in newer versions.
It’s safe to keep idempotence enabled when avoiding duplicate records and when we want strong delivery guarantees without implementing custom application-level deduplication.
On the other hand, if our primary goal is to have an absolute maximum throughput and duplicates are acceptable, we can disable idempotence. Even then, the performance difference is usually small while the risk of duplicates increases.
6. Conclusion
In this article, we explored idempotence in Kafka and measured its impact on performance using a JMH benchmark.
The results show that idempotence doesn’t cause a significant throughput drop. The differences we observed were minor and within normal measurement variability.
As always, complete code examples are available over on GitHub.















