Kafka Producer Error Handling, Retry, and Recovery

Nishanth Reddy Emmadi
3 min readJul 15, 2020

--

Here in this blog post lets see how to Handle Errors, Retry, and Recovery in a Kafka producer perspective.
From the displayed image, you can observe a scenario where a Microservice-based Kafka producer, producing events on to a certain topic on to the Kafka cluster with 4 brokers.
In Real-time applications where data events are critical and data loss is unacceptable, Error handling, retrying, and recovering plays a vital role in making an application Resilient.

For handling Errors , retrying the events with specific failures and establishing a recovery mechanisms needs a understanding of what are the scenarios that a producer fails to produce message on to cluster. Here are the main reasons

  1. Kafka cluster itself is down and unavailable.
  2. If Kafka producer configuration “acks” is configured to “all” and some brokers are unavailable.
  3. If Kafka producer configuration “min.insyn.replicas” is specified as 2 and only one broker is available. Here min.insync.replicas and acks allow you to enforce greater durability guarantees. A typical scenario would be to create a topic with a replication factor of 3, set min.insync.replicas to 2, and produce with acks of “all”. This will ensure that the producer raises an exception if a majority of replicas do not receive a write.

Let’s see the Options to handle these errors.

Approach 1:

Producer using intermediate retry topic to store retry-able events and retrying the events with the use of a consumer

Here in this approach when the brokers in the cluster fail to meets the producer configurations like acks and min.insync.replicas or other Kafka meta-data failures related to brokers, those events are produced to recovery or retry topic. With the help of a Kafka consumer in the same application, those are polled and sent to the producer for retying. In my opinion, this approach is not recommended because the Kafka cluster itself can be unavailable then retry topic cannot be helpful.

Approach 2 :

Producer using intermediate Database to store retry-able events and retrying the events with the use of a scheduler.

In this approach, the retry-able events are saved to a database, with the help of a scheduler process the event records are pulled from the database and passed to the producer for retrying. This is more reliable than the first approach in case of cluster unavailability.

Sample code snippet :

/*
Here I am using kafkaTemplate sendDefault method to post the event to Kafka topic
This method gets the topic name from the properties file "spring.kafka.template.default-topic"kafkaTemplate provides other overloaded methods to take more control by specifying. topic, partition, headers, event.I like kafkaTemplate with producerRecord object

*/
ListenableFuture<SendResult<String, String>> sendResultListenableFuture = kafkaTemplate.sendDefault(key, value);


/*
kafkaTemplate's response is in the form of ListenableFutureListenableFuture added with a call-back to gracefully handle success or failure

*/
sendResultListenableFuture.addCallback(new ListenableFutureCallback<SendResult<String, String>>() {

@Override
public void onFailure(Throwable ex) {

handleFailure(key,value, ex);

}

@Override
public void onSuccess(SendResult<String, String> result) {

handleSuccess(key,value,result);

}
});


}

private void handleSuccess(String key, String value, SendResult<String, String> result) {

log.info("The record with key : {}, value : {} is produced sucessfullly to offset {}", key, value, result.getRecordMetadata().offset());

}

private void handleFailure(String key, String value, Throwable ex) {

log.info("The record with key: {}, value: {} cannot be processed! caused by {}", key, value, ex.getMessage());
// Here you can implement the code to filter based on exception type and place the events on to a topic as the first approach or a database like the second approach.}

Thank you.

--

--

Nishanth Reddy Emmadi
Nishanth Reddy Emmadi

Written by Nishanth Reddy Emmadi

Software Engineer | Tech Enthusiast | Constant Learner

Responses (2)