Kubernetes controller-manager leader election failures
Your Deployment has stopped scaling. Nodes cordoned hours ago are still draining. Garbage collection is paused, and orphaned volumes are not being cleaned up. The kube-controller-manager runs these reconciliation loops, and in an HA cluster only the leader performs work. When leader election fails, the controller-manager exits, and the control plane stops acting on desired state. Existing workloads keep running, but nothing new is managed.
The kube-controller-manager coordinates through a Lease object in the coordination.k8s.io API group. The leader must renew the lease before --leader-elect-renew-deadline (default 10 seconds) elapses. The lease itself expires after --leader-elect-lease-duration (default 15 seconds). Renewal is attempted every --leader-elect-retry-period (default 2 seconds). If a write to etcd is too slow, if the API server is saturated, if RBAC is stripped, or if the election timing is misconfigured, the leader loses the lock, logs leaderelection lost, and exits. During the gap, no instance holds a valid lease, so controllers stop reconciling.
What this means
In an HA cluster, kube-controller-manager instances use the Lease named kube-controller-manager in the kube-system namespace. If the leader cannot renew, typically because a Lease update is blocked on slow etcd fsync or API server saturation, it logs:
failed to renew lease kube-system/kube-controller-manager: failed to tryAcquireOrRenew context deadline exceeded
It then logs leaderelection lost and exits. The container restarts and re-contests. This is intentional: the leader prefers to die rather than continue with a potentially stale view. Any transient write latency can trigger a full controller restart and a reconciliation pause.
A related failure mode is the leaderless window after a clean shutdown. Because the kube-controller-manager does not release the lease on exit, the old lease persists for its full TTL, typically ~15 seconds by default. During that window no leader is active even if another instance is healthy.
flowchart TD
A[etcd disk latency spike] --> B[API server mutating latency rises]
B --> C[Lease PUT request times out]
C --> D[Leader fails to renew before deadline]
D --> E[Process exits with leaderelection lost]
E --> F[No valid lease for ~15s TTL window]
F --> G[Controllers stop reconciling]Common causes
| Cause | What it looks like | First thing to check |
|---|---|---|
| etcd or API server latency | context deadline exceeded in logs during lease renewal; elevated etcd fsync or API mutating latency | etcd_disk_wal_fsync_duration_seconds and API server p99 latency |
| Lease timing misconfiguration | Premature leadership loss under light load because the renew deadline is too close to the lease duration | Pod spec for --leader-elect-lease-duration and --leader-elect-renew-deadline values |
| RBAC denied on Lease operations | Forbidden errors in controller-manager or audit logs when updating the Lease | Default ClusterRole system:kube-controller-manager permissions on leases |
| Retrieving-lock timeout cascade | After one failed acquisition, the controller-manager cannot recover and enters CrashLoopBackOff | Pod restart count and logs for timed out waiting for the condition |
| Leaderless window after clean exit | A ~15 second gap in reconciliation immediately after a rolling restart or cordon of the leader | Lease renewTime relative to the event, compared against leaseDurationSeconds |
Quick checks
# Check the current Lease holder and freshness
kubectl get lease kube-controller-manager -n kube-system -o yaml
# Check controller-manager pod health and restart count
kubectl get pods -n kube-system -l component=kube-controller-manager
# Check logs for leader election failures
kubectl logs -n kube-system kube-controller-manager-<node> | grep -iE "leaderelection|renew lease|failed to acquire"
# Check etcd WAL fsync latency (stacked or self-managed etcd)
curl -s http://localhost:2379/metrics | grep '^etcd_disk_wal_fsync_duration_seconds'
# Check API server mutating latency (lease renewals use PUT)
kubectl get --raw /metrics | grep 'apiserver_request_duration_seconds' | grep -E 'verb="(POST|PUT)"'
# Check default controller-manager ClusterRole for lease permissions
kubectl get clusterrole system:kube-controller-manager -o yaml | grep -C 3 leases
# Check configured leader-election flags
kubectl get pod -n kube-system -l component=kube-controller-manager -o yaml | grep -E 'leader-elect-lease-duration|leader-elect-renew-deadline'
How to diagnose it
- Confirm the Lease state. Run
kubectl get lease kube-controller-manager -n kube-system -o yaml. CheckholderIdentityto see which instance is the leader. Check whetherrenewTimeis withinleaseDurationSecondsof the current time, allowing for clock skew. If the lease has not been renewed within its TTL, there is no active leader and reconciliation is stopped. - Read the controller-manager logs. Look for
failed to renew lease,leaderelection lost, ortimed out waiting for the condition. The exact message separates a renewal timeout from an acquisition failure. Renewal timeouts point to API server or etcd latency; acquisition timeouts point to a stuck lock or extreme latency. - Check etcd and API server latency. Lease renewals are synchronous writes through the API server to etcd. If etcd WAL fsync p99 is above 100 ms or API server mutating latency is elevated, the renewal will timeout. See Kubernetes API server etcd latency: detection and cascading failures.
- Verify leader election flag margins. If the log shows premature loss even when the API server is healthy, compare
--leader-elect-lease-durationand--leader-elect-renew-deadline. If they are set too close together, for example 25s and 20s, there is insufficient time for retries before expiration, causing repeated flapping. - Check for CrashLoopBackOff from lock retrieval failure. If the pod is restarting repeatedly and logs show
timed out waiting for the conditionon retrieving the lock, the controller-manager may not recover automatically until the underlying latency is resolved. This matches the behavior described in GitHub issue #117922. - Audit for RBAC denials. If logs or audit logs show 403 Forbidden on lease operations, verify that the controller-manager identity retains
get,create, andupdatepermissions onleasesinkube-system. - Map the gap to a control plane event. If the failure began during a rolling update of the control plane nodes, the leaderless window from unclean lease release may be the cause. Expect a reconciliation gap equal to the full lease TTL, typically ~15 seconds by default, per leader transition until the old lease expires.
Metrics and signals to monitor
| Signal | Why it matters | Warning sign |
|---|---|---|
| Lease object freshness | A stale lease means no leader is reconciling state | renewTime older than leaseDurationSeconds plus a clock skew margin |
| Controller-manager pod restarts | The leader exits on renewal failure | More than 1 restart in 5 minutes |
| etcd WAL fsync latency | Every lease renewal waits on etcd fsync | p99 > 100 ms sustained |
| API server mutating request latency | Lease updates are PUT operations through the API server | p99 > 500 ms sustained |
| Controller workqueue depth | If the leader was already falling behind, depth grows before the exit | Depth > 100 sustained for core queues |
Audit log 403 rate on leases | RBAC denial blocks renewal | Any sustained 403 on coordination.k8s.io/v1/namespaces/kube-system/leases |
Fixes
If the cause is etcd or API server latency
Fix the storage or API server bottleneck first. Reduce etcd disk I/O contention, defragment etcd during a maintenance window if needed, and ensure the API server is not saturated by admission webhooks or APF throttling. Do not restart the controller-manager until underlying latency drops. Restarting into the same slow environment will produce a CrashLoopBackOff.
If the cause is lease timing misconfiguration
Adjust the flags with safe margins. The renew deadline must be strictly less than the lease duration, with enough room for the retry period to fire multiple times. Avoid aggressive custom values. If you must change them, keep the renew deadline well under the lease duration and test under production load before rolling out.
If the cause is RBAC drift
Restore get, create, and update permissions on leases in the kube-system namespace for the controller-manager identity. Restart the controller-manager after fixing the Role or ClusterRoleBinding.
If the cause is a retrieving-lock timeout cascade
If the controller-manager is in CrashLoopBackOff after repeated acquisition timeouts, resolve the root latency first, then restart the controller-manager instance. Simply restarting without fixing API server or etcd latency will not help because the new process will hit the same timeout.
If the cause is a leaderless window during maintenance
There is no user-configurable fix for lease release on clean exit in current stable releases. The ControllerManagerReleaseLeaderElectionLockOnExit feature gate exists but remains alpha and defaults to off. Run multiple controller-manager instances so failover is possible, and schedule control plane restarts during periods of low cluster mutation. The gap equals the full lease TTL, typically ~15 seconds by default.
Prevention
- Do not tune
--leader-elect-lease-durationand--leader-elect-renew-deadlineto values that leave no headroom. The defaults exist for a reason. - Treat etcd disk latency and API server mutating latency as leading indicators for control plane health. Alert on them before controller-manager restarts begin.
- Monitor controller-manager pod restart count in
kube-systemas a binary signal of leader distress. - Protect the controller-manager RBAC bindings from accidental changes during cluster upgrades or policy audits.
- Maintain HA with at least two controller-manager instances so a single leader exit does not leave the cluster entirely without reconciliation.
How Netdata helps
- Correlate etcd WAL fsync latency with controller-manager pod restarts in
kube-systemto identify the cascade. - Track API server request latency and error rates to catch saturation before lease renewals fail.
- Monitor control plane node disk I/O and CPU to surface resource pressure that slows etcd.
- Alert on sustained increases in controller workqueue depth as an early sign that the leader is falling behind.
- Surface audit log anomalies, including 403 responses on Lease operations, for RBAC-related leader election failures.
Related guides
- For etcd latency root cause analysis, see Kubernetes API server etcd latency: detection and cascading failures.
- For API server saturation and APF queueing, see Kubernetes API server rate limiting: APF priority levels and starvation.
- For general API server slowness, see Kubernetes API server slow or unresponsive: causes and fixes.
- For pod restart patterns, see Kubernetes pod CrashLoopBackOff: causes, diagnosis, and fixes.
- For control plane monitoring fundamentals, see Kubernetes monitoring checklist: the signals every production cluster needs.






