Consensus failures in distributed systems happen because nodes can’t agree on the state of the system, leading to data inconsistencies or service outages.
Here’s a typical scenario: a Kafka cluster where consumers are reporting messages are being processed twice, or not at all, and the controller logs are full of [Error] Controller 0-0: Error sending fetch request to broker 2: (error code 5, LEADER_NOT_AVAILABLE) or [Error] Controller 0-0: Error sending produce request to broker 1: (error code 5, LEADER_NOT_AVAILABLE).
This specific error, LEADER_NOT_AVAILABLE, means a Kafka controller (the brain of the cluster) tried to talk to a broker that it expected to be the leader for a partition, but that broker either didn’t respond or explicitly said it wasn’t the leader. This breaks the chain of command for producing and consuming messages.
Common Causes and Fixes
-
Broker Crashing or Unresponsive:
- Diagnosis: Check the broker logs (e.g.,
/var/log/kafka/server.log) for signs of crashes (OutOfMemoryError,StackOverflowError,fatal error) or long garbage collection pauses. Monitor broker resource utilization (CPU, memory, disk I/O) using tools liketop,htop, or Prometheus/Grafana. - Fix: If a broker is consistently crashing due to OOM, increase its JVM heap size. For example, in
kafka-server-start.shor a systemd unit file, find theKAFKA_HEAP_OPTSenvironment variable and adjust it, e.g.,export KAFKA_HEAP_OPTS="-Xmx8g -Xms8g". If it’s resource contention, scale up the underlying hardware or optimize other services on the same machine. - Why it works: Providing more memory or reducing contention allows the broker process to run without crashing or being starved of resources, enabling it to respond to controller requests.
- Diagnosis: Check the broker logs (e.g.,
-
Network Partition:
- Diagnosis: Use
pingandtraceroutebetween controller nodes and affected brokers to check for packet loss or high latency. Examine firewall logs on both client and server sides for dropped connections. Tools liketcpdumpcan reveal if packets are even reaching the destination. - Fix: If a firewall is blocking traffic, open the necessary Kafka ports (e.g., 9092 for clients, 2888 and 3888 for ZooKeeper quorum if applicable, 9093 for inter-broker communication). If it’s a network misconfiguration, correct routing or switch configurations. Ensure
advertised.listenersandlistenersinserver.propertiescorrectly reflect the network interfaces brokers should use. - Why it works: Restoring network connectivity allows the controller to communicate with the brokers as expected, resolving the
LEADER_NOT_AVAILABLEerror by enabling the controller to find and interact with the actual leader.
- Diagnosis: Use
-
ZooKeeper Issues (if using ZooKeeper for Kafka metadata):
- Diagnosis: Check ZooKeeper server logs (
zookeeper.out) for errors likeZooKeeperServer.myid file is missingorOut of memory. Verify ZooKeeper quorum health: all ZooKeeper nodes should be inmode: followerormode: leader. Useecho stat | nc <zookeeper_host> 2181to check individual ZooKeeper node status. - Fix: Ensure ZooKeeper nodes can communicate with each other. If
myidis missing, recreate it in the ZooKeeper data directory. If OOM, increase ZooKeeper’s JVM heap size (JAVA_OPTSinzkServer.shor systemd unit). Ensure ZooKeeper’stickTime,syncLimit, andinitLimitare appropriately configured for your network. - Why it works: Kafka relies heavily on ZooKeeper for leader election and metadata. A healthy ZooKeeper ensemble ensures that Kafka brokers can correctly register, discover leaders, and maintain cluster state.
- Diagnosis: Check ZooKeeper server logs (
-
Broker Disk Full or I/O Throttling:
- Diagnosis: Monitor disk space on broker nodes (
df -h). Check broker logs forIOErrororKafkaException: Failed to write to log. Useiostat -xz 1to observe disk utilization and await times. - Fix: Free up disk space by deleting old logs or increasing storage capacity. If I/O is the bottleneck, upgrade to faster disks (SSDs) or optimize Kafka’s log retention policies (
log.retention.hours,log.retention.bytes) to prevent disks from filling up. - Why it works: Kafka needs to write data to disk for durability and to serve requests. Full disks or slow I/O prevent these operations, making brokers appear unresponsive and causing leader election failures.
- Diagnosis: Monitor disk space on broker nodes (
-
Incorrect
replica.lag.time.max.msConfiguration:- Diagnosis: This is a more subtle one. If brokers are healthy but intermittently slow to replicate, the controller might consider a broker unavailable if it hasn’t caught up within this threshold. Check
server.propertiesforreplica.lag.time.max.ms. - Fix: Increase
replica.lag.time.max.ms(e.g., from default 10000ms to 30000ms or higher). This gives replicas more time to catch up before being considered out of sync. - Why it works: This setting is a timeout for how long a replica can lag before the controller considers it unhealthy. Increasing it provides more grace period for temporary network glitches or brief broker slowdowns, preventing premature leader demotions.
- Diagnosis: This is a more subtle one. If brokers are healthy but intermittently slow to replicate, the controller might consider a broker unavailable if it hasn’t caught up within this threshold. Check
-
Controller Overload or Misconfiguration:
- Diagnosis: If you have multiple Kafka brokers, check the logs of the controller broker (you can often identify it by a
[Controller id=X]tag). Is it overwhelmed with requests? Are its logs showing similar network errors to the ones affecting other brokers? - Fix: Ensure the controller broker has sufficient resources. If it’s a dedicated controller, it should have good network connectivity and CPU. Sometimes, simply restarting the controller broker can resolve transient issues. If you suspect a ZooKeeper interaction issue, ensure ZooKeeper is healthy.
- Why it works: The controller is responsible for managing partitions, leaders, and replicas. If the controller itself is unhealthy or struggling, it can’t accurately track partition leaders, leading to widespread
LEADER_NOT_AVAILABLEerrors across the cluster.
- Diagnosis: If you have multiple Kafka brokers, check the logs of the controller broker (you can often identify it by a
The next error you’ll likely hit after fixing these is related to partition reassignments or unclean leader elections, as the system tries to recover from the prior state of instability.