Course – LS – All

Get started with Spring and Spring Boot, through the Learn Spring course:

>> CHECK OUT THE COURSE

1. Overview

Apache Kafka is an open-source distributed event streaming platform.

In this quick tutorial, we’ll learn techniques for getting the number of messages in a Kafka topic. We’ll demonstrate programmatic as well as native commands techniques.

2. Programmatic Technique

A Kafka topic may have multiple partitions. Our technique should make sure we’ve counted the number of messages from every partition.

We’ve to go through each partition and check their latest offset. For this, we’ll introduce a consumer:

KafkaConsumer<String, String> consumer = new KafkaConsumer<String, String>(props);

The second step is to get all the partitions from this consumer:

List<TopicPartition> partitions = consumer.partitionsFor(topic).stream().map(p -> new TopicPartition(topic, p.partition()))
    .collect(Collectors.toList());

The third step is to offset the consumer at the end of each partition and record the result in a partition map:

consumer.assign(partitions);
consumer.seekToEnd(Collections.emptySet());
Map<TopicPartition, Long> endPartitions = partitions.stream().collect(Collectors.toMap(Function.identity(), consumer::position));

The final step is to take the last positions in each partition and sum the result to get the number of messages in the topic:

numberOfMessages = partitions.stream().mapToLong(p -> endPartitions.get(p)).sum();

3. Kafka Native Commands

Programmatic techniques are good to have in case we want to perform some automated tasks on the number of messages on a Kafka topic. However, if it’s only for analysis purposes, it’ll be an overhead to create these services and run them on a machine. A straightforward option would be to make use of native Kafka commands. It’ll give quick results.

3.1. Using GetoffsetShell Command

Before executing native commands, we’ve to navigate to Kafka’s root folder on the machine. The following command returns us the number of messages being published on the topic baeldung:

$ bin/kafka-run-class.sh kafka.tools.GetOffsetShell   --broker-list localhost:9092   
--topic baeldung   | awk -F  ":" '{sum += $3} END {print "Result: "sum}'
Result: 3

3.2. Using Consumer Console

As discussed earlier, we’ll be navigating to Kafka’s root folder before any executing commands. The following command returns the number of messages being published on the topic baeldung:

$ bin/kafka-console-consumer.sh  --from-beginning  --bootstrap-server localhost:9092 
--property print.key=true --property print.value=false --property print.partition 
--topic baeldung --timeout-ms 5000 | tail -n 10|grep "Processed a total of"
Processed a total of 3 messages

4. Conclusion

In this article, we’ve looked into techniques to get the number of messages in a Kafka topic. We learned a programmatic technique that assigns all partitions to a consumer and checks the latest offset.

We also saw two native Kafka commands techniques. One was the GetoffsetShell command from Kafka tools. The other one was running a consumer on the console and printing the number of messages from the beginning.

As always, the source code of this article can be found over on GitHub.

Course – LS – All

Get started with Spring and Spring Boot, through the Learn Spring course:

>> CHECK OUT THE COURSE
res – REST with Spring (eBook) (everywhere)
2 Comments
Oldest
Newest
Inline Feedbacks
View all comments
Comments are closed on this article!