Commit Offsets in Kafka

Azure Spring Apps is a fully managed service from Microsoft (built in collaboration with VMware), focused on building and deploying Spring Boot applications on Azure Cloud without worrying about Kubernetes.

The Enterprise plan comes with some interesting features, such as commercial Spring runtime support, a 99.95% SLA and some deep discounts (up to 47%) when you are ready for production.

>> Learn more and deploy your first Spring Boot app to Azure.

And, you can participate in a very quick (1 minute) paid user research from the Java on Azure product team.

Slow MySQL query performance is all too common. Of course it is. A good way to go is, naturally, a dedicated profiler that actually understands the ins and outs of MySQL.

The Jet Profiler was built for MySQL only, so it can do things like real-time query performance, focus on most used tables or most frequent queries, quickly identify performance issues and basically help you optimize your queries.

Critically, it has very minimal impact on your server's performance, with most of the profiling work done separately - so it needs no server changes, agents or separate services.

Basically, you install the desktop application, connect to your MySQL server, hit the record button, and you'll have results within minutes:

>> Try out the Profiler

Accelerate Your Jakarta EE Development with Payara Server!

With best-in-class guides and documentation, Payara essentially simplifies deployment to diverse infrastructures.

Beyond that, it provides intelligent insights and actions to optimize Jakarta EE applications.

The goal is to apply an opinionated approach to get to what's essential for mission-critical applications - really solid scalability, availability, security, and long-term support:

>> Download and Explore the Guide (to learn more)

The AI Assistant to boost Boost your productivity writing unit tests - Machinet AI.

AI is all the rage these days, but for very good reason. The highly practical coding companion, you'll get the power of AI-assisted coding and automated unit test generation.
Machinet's Unit Test AI Agent utilizes your own project context to create meaningful unit tests that intelligently aligns with the behavior of the code.
And, the AI Chat crafts code and fixes errors with ease, like a helpful sidekick.

Simplify Your Coding Journey with Machinet AI:

>> Install Machinet AI in your IntelliJ

Looking for the ideal Linux distro for running modern Spring apps in the cloud?

Meet Alpaquita Linux: lightweight, secure, and powerful enough to handle heavy workloads.

This distro is specifically designed for running Java apps. It builds upon Alpine and features significant enhancements to excel in high-density container environments while meeting enterprise-grade security standards.

Specifically, the container image size is ~30% smaller than standard options, and it consumes up to 30% less RAM:

>> Try Alpaquita Containers now.

DbSchema is a super-flexible database designer, which can take you from designing the DB with your team all the way to safely deploying the schema.

The way it does all of that is by using a design model, a database-independent image of the schema, which can be shared in a team using GIT and compared or deployed on to any database.

And, of course, it can be heavily visual, allowing you to interact with the database using diagrams, visually compose queries, explore the data, generate random data, import data or build HTML5 database reports.

>> Take a look at DBSchema

Slow MySQL query performance is all too common. Of course it is. A good way to go is, naturally, a dedicated profiler that actually understands the ins and outs of MySQL.

Critically, it has very minimal impact on your server's performance, with most of the profiling work done separately - so it needs no server changes, agents or separate services.

Basically, you install the desktop application, connect to your MySQL server, hit the record button, and you'll have results within minutes:

>> Try out the Profiler

1. Overview

In Kafka, consumers read messages from partitions. While reading messages, there are some concerns to consider, like determining which messages to read from the partitions or, preventing duplicate message reading or message loss in case of failure. The solution to these concerns is the use of offsets.

In this tutorial, we’ll learn about offsets in Kafka. We’ll see how to commit offsets to manage message consumption and discuss its methods and drawbacks.

2. What Is Offset?

We know that Kafka stores messages in topics, and each topic can have multiple partitions. Each consumer reads messages from one partition of a topic. Here, Kafka, with the help of offsets, keeps track of the messages that consumers read. Offsets are integers starting from zero that increment by one as the message gets stored.

Let’s say one consumer has read five messages from a partition. Then, based on configuration, Kafka marks the offset till 4 as committed(zero-based sequence). The consumer consumes messages with offsets 5 onwards the next time it attempts to read messages.

Without offsets, there is no way to avoid duplicate processing or data loss. That’s why it’s so crucial.

We can make an analogy with database storage. In a database, we commit after executing SQL statements to persist the changes. In the same way, after reading from the partition, we commit offsets to mark the position of the processed message.

3. Ways to Commit Offsets

There are four ways to commit offsets. We’ll look at each in detail and discuss their use cases, advantages, and disadvantages.

Let’s start by adding the Kafka Client API dependency in the pom.xml:

<dependency>
    <groupId>org.apache.kafka</groupId>
    <artifactId>kafka-clients</artifactId>
    <version>3.6.1</version>
</dependency>

3.1. Auto Commit

This is the simplest way to commit offsets. Kafka, by default, uses auto-commit – at every five seconds it commits the largest offset returned by the poll() method. poll() returns a set of messages with a timeout of 10 seconds, as we can see in the code:

KafkaConsumer<Long, String> consumer = new KafkaConsumer<>(props);
consumer.subscribe(KafkaConfigProperties.getTopic());
ConsumerRecords<Long, String> messages = consumer.poll(Duration.ofSeconds(10));
for (ConsumerRecord<Long, String> message : messages) {
  // processed message
}

The problem with auto-commit is that there is a very high chance of data loss in case of application failure. When poll() returns the messages, Kafka may commit the largest offset before processing messages.

Let’s say poll() returns 100 messages, and the consumer processes 60 messages when the auto-commit happens. Then, due to some failure, the consumer crashes. When a new consumer goes live to read messages, it commences reading from offset 101, resulting in the loss of messages between 61 and 100.

Thus, we need other ways where this drawback isn’t present. The answer is manual commit.

3.2. Manual Sync Commit

In manual commits, whether sync or async, it’s necessary to disable auto-commit by setting the default property (enabled.auto.commit property) to false:

Properties props = new Properties();
props.put(ConsumerConfig.ENABLE_AUTO_COMMIT_CONFIG, "false");

After disabling the manual commit, let’s now understand the use of commitSync():

KafkaConsumer<Long, String> consumer = new KafkaConsumer<>(props);
consumer.subscribe(KafkaConfigProperties.getTopic());
ConsumerRecords<Long, String> messages = consumer.poll(Duration.ofSeconds(10));
  //process the messages
consumer.commitSync();

This method prevents data loss by committing the offset only after processing the messages. However, it doesn’t prevent duplicate reading when a consumer crashes before committing the offset. Besides this, it also impacts application performance.

The commitSync() blocks the code until it completes. Also, in case of an error, it keeps on retrying. This decreases the throughput of the application, which we don’t want. So, Kafka provides another solution, async commit, that deals with these drawbacks.

3.3. Manual Async Commit

Kafka provides commitAsync() to commit offsets asynchronously. It overcomes the performance overhead of manual sync commits by committing offsets in different threads. Let’s implement an async commit to understand this:

KafkaConsumer<Long, String> consumer = new KafkaConsumer<>(props); 
consumer.subscribe(KafkaConfigProperties.getTopic()); 
ConsumerRecords<Long, String> messages = consumer.poll(Duration.ofSeconds(10));
  //process the messages
consumer.commitAsync();

The problem with the async commit is that it doesn’t retry in case of failure. It relies on the next call of commitAsync(), which will commit the latest offset.

Suppose 300 is the largest offset we want to commit, but our commitAsync() fails due to some issue. It could be possible that before it retries, another call of commitAsync() commits the largest offset of 400 as it is asynchronous. When failed commitAsync() retries and if it commits offsets 300 successfully, it will overwrite the previous commit of 400, resulting in duplicate reading. That is why commitAsync() doesn’t retry.

3.4. Commit Specific Offset

Sometimes, we need to take more control over offsets. Let’s say we’re processing the messages in small batches and want to commit the offsets as soon as messages are processed. We can use the overloaded method of commitSync() and commitAsync() that takes a map argument to commit the specific offset:

KafkaConsumer<Long, String> consumer = new KafkaConsumer<>(props);
consumer.subscribe(KafkaConfigProperties.getTopic());
Map<TopicPartition, OffsetAndMetadata> currentOffsets = new HashMap<>();
int messageProcessed = 0;
  while (true) {
    ConsumerRecords<Long, String> messages = consumer.poll(Duration.ofSeconds(10));
    for (ConsumerRecord<Long, String> message : messages) {
        // processed one message
      messageProcessed++;
      currentOffsets.put(
          new TopicPartition(message.topic(), message.partition()),
          new OffsetAndMetadata(message.offset() + 1));
      if (messageProcessed%50==0){
        consumer.commitSync(currentOffsets);
      }
    }
  }

In this code, we manage a currentOffsets map, which takes TopicPartition as key and OffsetAndMetadata as value. We insert the TopicPartition and OffsetAndMetadata of processed messages during message processing into the currentOffsets map. When the number of processed messages reaches fifty, we call commitSync() with the currentOffsets map to mark these messages as committed.

The behavior of this way is the same as sync and async commit. The only difference is that here we’re deciding the offsets to be committed not Kafka.

4. Conclusion

In this article, we learned about the offset and its importance in Kafka. Further, we explored the four ways to commit the offsets, both manual and automatic. Lastly, we analyzed their respective pros and cons. We can conclude that there is no definitive best way to commit in Kafka; rather, it depends on the specific use cases.

All the code examples used in this article are available over on GitHub.

Commit Offsets in Kafka

Get started with Spring and Spring Boot, through the Learn Spring course:

1. Overview

2. What Is Offset?

3. Ways to Commit Offsets

3.1. Auto Commit

3.2. Manual Sync Commit

3.3. Manual Async Commit

3.4. Commit Specific Offset

4. Conclusion

Get started with Spring and Spring Boot, through the Learn Spring course:

REST with Spring

Learn Spring Security ▼▲

Learn Spring Security Core

Learn Spring Security OAuth

Learn Spring

Learn Spring Data JPA

Persistence

REST

Security

Full Archive

Baeldung Ebooks

About Baeldung

Write for Baeldung

Get started with Spring and Spring Boot, through the Learn Spring course:

1. Overview

2. What Is Offset?

3. Ways to Commit Offsets

3.1. Auto Commit

3.2. Manual Sync Commit

3.3. Manual Async Commit

3.4. Commit Specific Offset

4. Conclusion

Get started with Spring and Spring Boot, through the Learn Spring course: