1. Overview
Kafka Streams is an Apache Kafka library used for building real-time, event-driven applications that process data streams.
Like any production system, stream processing can fail for many reasons, such as network timeout, corrupt data, or processing errors.
In this tutorial, we’ll learn how to handle various exceptions in a Kafka stream application. We’ll implement an exception handling mechanism and test what happens when exceptions occur during stream processing.
2. Example Application With Kafka Streams
Let’s imagine we need to build a simple Kafka stream service that aggregates and sends data.
2.1. Maven Dependencies
First, we’ll include the kafka-clients and kafka-streams dependencies:
<dependency>
<groupId>org.apache.kafka</groupId>
<artifactId>kafka-clients</artifactId>
<version>3.9.0</version>
</dependency>
<dependency>
<groupId>org.apache.kafka</groupId>
<artifactId>kafka-streams</artifactId>
<version>3.9.0</version>
</dependency>
Next, we’ll define the data model.
2.2. Define the Data Model
We’ll define a User record, which the producer and consumer will use:
public record User(String id, String name, String country) {
}
The producer will send the User data to the stream service.
2.3. Implement the Serializer and Deserializer
Our Kafka streaming service will then need serializer and deserializer classes for the above User data.
First, we’ll implement a custom UserSerializer class by implementing the Serializer interface:
public class UserSerializer implements Serializer<User> {
private final ObjectMapper mapper = new ObjectMapper();
@Override
public byte[] serialize(String topic, User user) {
if (user == null) {
return null;
}
try {
return mapper.writeValueAsBytes(user);
} catch (JsonProcessingException ex) {
throw new RuntimeException(ex);
}
}
}
In the code above, we’re serializing the User object into a byte array.
Next, let’s write the UserDeserializer class by implementing the Deserializer‘s deserialize() method:
public class UserDeserializer implements Deserializer<User> {
private final ObjectMapper mapper = new ObjectMapper();
@Override
public User deserialize(String topic, byte[] bytes) {
if (bytes == null || bytes.length == 0) {
return null;
}
try {
return mapper.readValue(bytes, User.class);
} catch (IOException ex) {
throw new RuntimeException(ex);
}
}
}
Finally, we’ll implement the UserSerde class by extending the Serdes.WrapperSerde class:
public class UserSerde extends Serdes.WrapperSerde<User> {
public UserSerde() {
super(new UserSerializer(), new UserDeserializer());
}
}
The Serde class is used by the streaming service to provide the serialization/deserialization implementation that materialises the records.
2.4. Implement the Streaming service
The streaming service will consume the User messages from the user-topic topic. Then, it’ll aggregate the user counts by country and send them to an outbound topic.
First, we’ll define the required stream-related configs in the UserStreamService class:
private static Properties getStreamProperties(String bootstrapServer) {
Properties props = new Properties();
props.put(StreamsConfig.APPLICATION_ID_CONFIG, "user-country-aggregator" + UUID.randomUUID());
props.put(StreamsConfig.BOOTSTRAP_SERVERS_CONFIG, bootstrapServer);
props.put(StreamsConfig.DEFAULT_KEY_SERDE_CLASS_CONFIG, Serdes.String().getClass());
props.put(StreamsConfig.DEFAULT_VALUE_SERDE_CLASS_CONFIG, Serdes.String().getClass());
return props;
}
Next, we’ll implement the streaming topology with the KStream and KTable instances in the above class:
public void start(String bootstrapServer) {
StreamsBuilder builder = new StreamsBuilder();
KStream<String, User> userStream = builder.stream(
"user-topic", Consumed.with(Serdes.String(), new UserSerde()));
KTable<String, Long> usersPerCountry = userStream
.filter((key, user) -> Objects.nonNull(user) && user.country() != null && !user.country().isEmpty())
.groupBy((key, user) -> user.country(), Grouped.with(Serdes.String(), new UserSerde()))
.count(Materialized.as("users_per_country_store"));
usersPerCountry.toStream()
.to("users_per_country", Produced.with(Serdes.String(), Serdes.Long()));
}
In the above code, we create a KStream object by passing the inbound topic and UserSerde instance.
We then create a KTable object using the KStream instance, to hold the aggregated user count per country. The aggregation filters out invalid data, groups by country, and stores the count.
Then, we send the usersPerCountry aggregates to an outbound topic users_per_country.
We should note that any exceptions in the above filter/aggregation can be handled with standard try-catch semantics.
Finally, we’ll instantiate the KafkaStreams object by passing the StreamsBuilder and the earlier defined properties:
Properties props = getStreamProperties(bootstrapServer);
kafkaStreams = new KafkaStreams(builder.build(), props);
kafkaStreams.start();
The Kafkastreams start() method will execute the stream pipeline.
3. Handling Exceptions in Kafka Streams
Stream processing can fail for various reasons, such as deserialization errors, broker unavailability, processing errors, or production exceptions in the outbound topic. Kafka provides built-in exception-handling classes.
We’ll handle each type of such exception.
3.1. Deserialization Exception
For deserialization-related errors, we can include the default deserialization exception handler property in the stream properties:
props.put(StreamsConfig.DEFAULT_DESERIALIZATION_EXCEPTION_HANDLER_CLASS_CONFIG,
LogAndContinueExceptionHandler.class);
The LogAndContinueExceptionHandler class will log and continue processing other messages in the stream.
We can also use LogAndFailExceptionHandler to fail the stream processing.
It’s also possible to provide a custom handler by extending the LogAndContinueExceptionHandler or implementing the DeserializationExceptionHandler interface. This enhances exception handling, for example, by allowing us to send failed messages to a dead-letter queue.
3.2. Production Exception
The stream can also fail while sending the aggregated messages to the outbound topic.
We can include the default production handler class in the stream properties:
props.put(StreamsConfig.DEFAULT_PRODUCTION_EXCEPTION_HANDLER_CLASS_CONFIG,
DefaultProductionExceptionHandler.class)
The above default handler fails the entire stream processing in case of any exception.
Alternatively, we can implement a custom handler by implementing the ProductionExceptionHandler interface and updating the same config:
public class CustomProductionExceptionHandler implements ProductionExceptionHandler {
@Override
public ProductionExceptionHandlerResponse handle(ErrorHandlerContext context,
ProducerRecord<byte[], byte[]> record, Exception exception) {
log.error("ProductionExceptionHandler Error producing record NodeId: {} | TaskId: {} | Topic: {} | Partition: {} | Exception: {}",
context.processorNodeId(), context.taskId(), record.topic(), record.partition(),
exception.getMessage(), exception);
return ProductionExceptionHandlerResponse.CONTINUE;
}
}
In the above code, we can either use CONTINUE or FAIL enum as per the requirement.
3.3. Processing Exception
While processing each record, the stream can throw common application exceptions such as NullPointerException or other exceptions due to other problems.
We’ll implement a custom processing exception handler by implementing the ProcessingExceptionHandler interface:
public class CustomProductionExceptionHandler implements ProductionExceptionHandler {
@Override
public ProductionExceptionHandlerResponse handle(ErrorHandlerContext context,
ProducerRecord<byte[], byte[]> record, Exception exception) {
log.error("ProductionExceptionHandler Error producing record NodeId: {} | TaskId: {} | Topic: {} | Partition: {} | Exception: {}",
context.processorNodeId(), context.taskId(), record.topic(), record.partition(),
exception.getMessage(), exception);
return ProductionExceptionHandlerResponse.CONTINUE;
}
}
In the above code, we’re logging and continuing the processing with the CONTINUE enum.
We’ll also need to include the above handler in the stream property:
props.put(StreamsConfig.PROCESSING_EXCEPTION_HANDLER_CLASS_CONFIG,
CustomProcessingExceptionHandler.class);
If an exception remains uncaught despite the above handlers, we can also choose to handle uncaught exceptions.
3.4. Uncaught Exception Handler
To handle any uncaught exceptions, we’ll use the StreamsUncaughtExceptionHandler interface.
Let’s implement a custom uncaught exception handler by providing an implementation:
public class StreamExceptionHandler implements StreamsUncaughtExceptionHandler {
@Override
public StreamThreadExceptionResponse handle(Throwable exception) {
log.error("Stream encountered fatal exception: {}", exception.getMessage(), exception);
return StreamThreadExceptionResponse.REPLACE_THREAD;
}
}
The possible return values for the StreamThreadExceptionResponse are REPLACE_THREAD, SHUTDOWN_CLIENT, and SHUTDOWN_APPLICATION.
The REPLACE_THREAD option will only replace the current running thread with a new one in case of exception, whereas SHUTDOWN_CLIENT will stop the local Kafka stream client due to a localized error.
The SHUTDOWN_APPLICATION option is only used when there’s a critical problem and the entire application across instances needs to be shut down.
We’ll also attach it to the KafkaStreams instance in the UserStreamService’s start() method:
kafkaStreams.setUncaughtExceptionHandler(new StreamExceptionHandler());
Additionally, we’ll add a shutdown hook in the start() method to close the stream gracefully on JVM shutdown:
Runtime.getRuntime().addShutdownHook(new Thread(streams::close));
Next, we’ll test the UserStreamService class for various scenarios.
4. Test the Streaming Service
We’ll implement the integration tests for the UserStreamService class using the testcontainers for Kafka.
4.1. Maven Dependencies
Let’s include the kafka and junit-jupiter testcontainers dependencies:
<dependency>
<groupId>org.testcontainers</groupId>
<artifactId>kafka</artifactId>
<version>1.19.3</version>
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.testcontainers</groupId>
<artifactId>junit-jupiter</artifactId>
<version>1.19.3</version>
<scope>test</scope>
</dependency>
Next, we’ll set up the integration test environment for the application.
4.2. Set Up the Test
First, we’ll set up a UserStreamLiveTest class with the required configs for the test producer and consumer:
private static Properties getProducerConfig() {
Properties producerProperties = new Properties();
producerProperties.put(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG, KAFKA_CONTAINER.getBootstrapServers());
producerProperties.put(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG, StringSerializer.class);
producerProperties.put(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG, UserSerializer.class);
return producerProperties;
}
private static Properties getConsumerConfig() {
Properties props = new Properties();
props.put(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG, KAFKA_CONTAINER.getBootstrapServers());
props.put(ConsumerConfig.GROUP_ID_CONFIG, "test-group-" + UUID.randomUUID());
props.put(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class.getName());
props.put(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG, LongDeserializer.class.getName());
props.put(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, "earliest");
return props;
}
Then, we’ll include the required Kafka-related objects dependencies and start the container and stream service:
@Testcontainers
class UserStreamLiveTest {
@Container
private static final KafkaContainer KAFKA_CONTAINER =
new KafkaContainer(DockerImageName.parse("confluentinc/cp-kafka:7.9.0"));
private static KafkaProducer<String, User> producer;
private static KafkaConsumer<String, Long> consumer;
private static UserStreamService streamService;
@BeforeAll
static void setup() {
KAFKA_CONTAINER.start();
streamService = new UserStreamService();
producer = new KafkaProducer<>(getProducerConfig());
consumer = new KafkaConsumer<>(getConsumerConfig());
new Thread(() -> streamService.start(KAFKA_CONTAINER.getBootstrapServers())).start();
}
}
In the code above, we start the Kafka container as a Docker container and then call the stream start() method in a separate thread.
4.3. Implement the Stream Service Tests
We’ll first test a scenario where valid users are sent, and the streaming service aggregates the data and sends it to the outbound topic.
@Test
void givenValidUserIsSent_whenStreamServiceStarts_thenAggregatedCountIsSent() {
producer.send(new ProducerRecord<>("user-topic", "x1", new User("1", "user1", "US")));
producer.send(new ProducerRecord<>("user-topic", "x2", new User("2", "user2", "DE")));
consumer.subscribe(List.of("users_per_country"));
Awaitility.await()
.atMost(45, TimeUnit.SECONDS)
.pollInterval(Duration.ofSeconds(1))
.untilAsserted(() -> {
ConsumerRecords<String, Long> records = consumer.poll(Duration.ofMillis(500));
Map<String, Long> counts = StreamSupport.stream(records.spliterator(), false)
.collect(Collectors.toMap(ConsumerRecord::key, ConsumerRecord::value));
assertTrue(counts.containsKey("US"));
assertTrue(counts.containsKey("DE"));
assertEquals(1L, counts.get("US"));
assertEquals(1L, counts.get("DE"));
});
}
In the above test, we see that the aggregated counts match the produced messages.
Next, let’s test when the producer send an invalid JSON to the topic:
@Test
void givenInvalidUserIsSent_whenStreamServiceStarts_thenAggregatedCountIsEmpty() throws Exception {
Properties props = new Properties();
props.put(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG, KAFKA_CONTAINER.getBootstrapServers());
props.put(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG, StringSerializer.class);
props.put(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG, ByteArraySerializer.class);
KafkaProducer<String, byte[]> producer = new KafkaProducer<>(props);
byte[] invalidJson = "{ invalid json".getBytes(StandardCharsets.UTF_8);
producer.send(new ProducerRecord<>("user-topic", "x3", invalidJson));
consumer.subscribe(List.of("users_per_country"));
Awaitility.await()
.atMost(30, TimeUnit.SECONDS)
.pollInterval(Duration.ofSeconds(1))
.untilAsserted(() -> {
ConsumerRecords<String, Long> records = consumer.poll(Duration.ofMillis(500));
assertTrue(records.isEmpty());
});
}
From the logs, we verify that the record failed to deserialize in the stream:
$ 21:53:28.534 [user-country-aggregatorfc997c10-StreamThread-1] ERROR c.b.kafkastreams.UserDeserializer - Error deserializing the message for topic user-topic
Finally, we’ll test the scenario when the producer sends null data and validate that it’s filtered out in the stream:
@Test
void givenValidAndNullUserIsSent_whenStreamServiceIsRunning_thenReturnAggregatedCount() {
producer.send(new ProducerRecord<>("user-topic", "x4", new User("4", "user4", "IE")));
producer.send(new ProducerRecord<>("user-topic", "x5", null));
consumer.subscribe(List.of("users_per_country"));
Awaitility.await()
.atMost(30, TimeUnit.SECONDS)
.pollInterval(Duration.ofSeconds(1))
.untilAsserted(() -> {
ConsumerRecords<String, Long> records = consumer.poll(Duration.ofMillis(500));
Map<String, Long> counts = StreamSupport.stream(records.spliterator(), false)
.collect(Collectors.toMap(ConsumerRecord::key, ConsumerRecord::value));
assertTrue(counts.containsKey("IE"));
assertEquals(1L, counts.get("IE"));
});
}
If the UserStreamService class didn’t have the null check in the filter, then we’ll get the processing handler log message:
$ 21:59:18.574 [user-country-aggregator41942d3b-StreamThread-1] ERROR c.b.k.CustomProcessingExceptionHandler - ProcessingExceptionHandler Error for record NodeId: KSTREAM-FILTER-0000000001 | TaskId: 0_0 | Key: x4 | Value: null | Exception: Cannot invoke "com.baeldung.kafkastreams.User.country()" because "user" is null
The CustomProcessingExceptionHandler‘s handle() method is returning the above error message.
5. Conclusion
In this article, we learned how to implement a Kafka streaming service with an example. We’ve also implemented various exception handlers like deserialization, production, processing, and uncaught exceptions.
Finally, we tested the entire setup by sending messages to the stream and receiving the aggregated count.
As always, the example code can be found over on GitHub.