Exactly Once Processing in Kafka with Java

Refactor Java code safely — and automatically — with OpenRewrite.

Refactoring big codebases by hand is slow, risky, and easy to put off. That’s where OpenRewrite comes in. The open-source framework for large-scale, automated code transformations helps teams modernize safely and consistently.

Each month, the creators and maintainers of OpenRewrite at Moderne run live, hands-on training sessions — one for newcomers and one for experienced users. You’ll see how recipes work, how to apply them across projects, and how to modernize code with confidence.

Join the next session, bring your questions, and learn how to automate the kind of work that usually eats your sprint time.

1. Overview

In this tutorial, we’ll look at how Kafka ensures exactly-once delivery between producer and consumer applications through the newly introduced Transactional API.

Additionally, we’ll use this API to implement transactional producers and consumers to achieve end-to-end exactly-once delivery in a WordCount example.

2. Message Delivery in Kafka

Due to various failures, messaging systems can’t guarantee message delivery between producer and consumer applications. Depending on how the client applications interact with such systems, the following message semantics are possible:

If a messaging system will never duplicate a message but might miss the occasional message, we call that at-most-once
Or, if it will never miss a message but might duplicate the occasional message, we call it at-least-once
But, if it always delivers all messages without duplication, that is exactly-once

Initially, Kafka only supported at-most-once and at-least-once message delivery.

However, introducing Transactions between Kafka brokers and client applications ensures exactly-once delivery in Kafka. To understand it better, let’s quickly review the transactional client API.

3. Maven Dependencies

To work with the transaction API, we’ll need Kafka’s Java client in our pom:

<dependency>
    <groupId>org.apache.kafka</groupId>
    <artifactId>kafka-clients</artifactId>
    <version>3.9.0</version>
</dependency>

**4. A Transactional consume-transform-produce Loop**

For our example, we’re going to consume messages from an input topic, sentences.

Then for each sentence, we’ll count every word and send the individual word counts to an output topic, counts.

In the example, we’ll assume that there is already transactional data available in the sentences topic.

4.1. A Transaction-Aware Producer

So, let’s first add a typical Kafka producer.

Properties producerProps = new Properties();
producerProps.put("bootstrap.servers", "localhost:9092");

Additionally, though, we need to specify a transactional.id and enable idempotence:

producerProps.put("enable.idempotence", "true");
producerProps.put("transactional.id", "prod-1");

KafkaProducer<String, String> producer = new KafkaProducer(producerProps);

Because we’ve enabled idempotence, Kafka will use this transaction id as part of its algorithm to deduplicate any message this producer sends, ensuring idempotency.

Simply put, if the producer accidentally sends the same message to Kafka more than once, these settings enable it to notice.

We need only make sure the transaction ID is distinct for each producer but consistent across restarts.

4.2. Enabling the Producer for Transactions

Once we are ready, then we also need to call initTransaction to prepare the producer to use transactions:

producer.initTransactions();

This registers the producer with the broker as one that can use transactions, identifying it by its transactional.id and a sequence number, or epoch. In turn, the broker will use these to write-ahead any actions to a transaction log.

And consequently, the broker will remove any actions from that log that belong to a producer with the same transaction id and earlier epoch, presuming them to be from defunct transactions.

4.3. A Transaction-Aware Consumer

When we consume, we can read all the messages on a topic partition in order. However, we can indicate with isolation.level that we should wait to read transactional messages until the associated transaction has been committed:

Properties consumerProps = new Properties();
consumerProps.put("bootstrap.servers", "localhost:9092");
consumerProps.put("group.id", "my-group-id");
consumerProps.put("enable.auto.commit", "false");
consumerProps.put("isolation.level", "read_committed");
KafkaConsumer<String, String> consumer = new KafkaConsumer<>(consumerProps);
consumer.subscribe(singleton(“sentences”));

Using a value of read_committed ensures that we don’t read any transactional messages before the transaction completes.

The default value of isolation.level is read_uncommitted.

4.4. Consuming and Transforming by Transaction

Now that we have the producer and consumer both configured to write and read transactionally, we can consume records from our input topic and count each word in each record:

ConsumerRecords<String, String> records = consumer.poll(ofSeconds(60));
Map<String, Integer> wordCountMap =
  records.records(new TopicPartition("input", 0))
    .stream()
    .flatMap(record -> Stream.of(record.value().split(" ")))
    .map(word -> Tuple.of(word, 1))
    .collect(Collectors.toMap(tuple -> 
      tuple.getKey(), t1 -> t1.getValue(), (v1, v2) -> v1 + v2));

Note that the above code is not transactional. But since we used read_committed, this means that this consumer will not read messages that were written to the input topic in the same transaction until they are all written.

Now, we can send the calculated word count to the output topic.

Let’s see how we can produce our results, also transactionally.

4.5. Send API

To send our counts as new messages, but in the same transaction, we call beginTransaction:

producer.beginTransaction();

Then, we can write each one to our “counts” topic with the key being the word and the count being the value:

wordCountMap.forEach((key,value) -> 
    producer.send(new ProducerRecord<String,String>("counts",key,value.toString())));

Note that because the producer can partition the data by the key, transactional messages can span multiple partitions, each being read by separate consumers. Therefore, the Kafka broker will store a list of all updated partitions for a transaction.

Note also that, within a transaction, a producer can use multiple threads to send records in parallel.

4.6. Committing Offsets

Finally, we need to commit the offsets that we just finished consuming. With transactions, we commit the offsets back to the input topic we read them from, like normal. We also send them to the producer’s transaction.

We can do all of this in a single call, but we first need to calculate the offsets for each topic partition:

Map<TopicPartition, OffsetAndMetadata> offsetsToCommit = new HashMap<>();
for (TopicPartition partition : records.partitions()) {
    List<ConsumerRecord<String, String>> partitionedRecords = records.records(partition);
    long offset = partitionedRecords.get(partitionedRecords.size() - 1).offset();
    offsetsToCommit.put(partition, new OffsetAndMetadata(offset + 1));
}

Note that what we commit to the transaction is the upcoming offset, meaning we need to add 1.

Then, we can send our calculated offsets to the transaction:

producer.sendOffsetsToTransaction(offsetsToCommit, new ConsumerGroupMetadata("my-group-id"));

4.7. Committing or Aborting the Transaction

And finally, we can commit the transaction, which will atomically write the offsets to the consumer_offsets topic as well as to the transaction itself:

producer.commitTransaction();

This flushes any buffered message to the respective partitions. In addition, the Kafka broker makes all messages in that transaction available to the consumers.

Of course, if anything goes wrong while we are processing, for example, if we catch an exception, we can call abortTransaction:

try {
  // ... read from input topic
  // ... transform
  // ... write to output topic
  producer.commitTransaction();
} catch ( Exception e ) {
  producer.abortTransaction();
}

Drop any buffered messages and remove the transaction from the broker.

If we neither commit nor abort before the broker-configured max.transaction.timeout.ms, the Kafka broker will abort the transaction itself. The default value for this property is 900,000 milliseconds or 15 minutes.

**5. Other consume-transform-produce Loops**

We’ve just seen a basic consume-transform-produce loop that reads and writes to the same Kafka cluster.

Conversely, applications that must read and write to different Kafka clusters must use the older commitSync and commitAsync API. Typically, applications will store consumer offsets into their external state storage to maintain transactionality.

6. Conclusion

For data-critical applications, end-to-end exactly-once processing is often imperative.

In this tutorial, we saw how we use Kafka to do exactly this, using transactions, and we implemented a transaction-based word counting example to illustrate the principle.

The code backing this article is available on GitHub. Once you're logged in as a Baeldung Pro Member, start learning and coding on the project.