$ guides / kubernetes / kubernetes-controller-manager-leader-election ▌

Operations Guides

Kubernetes controller-manager leader election failures

Your Deployment has stopped scaling. Nodes cordoned hours ago are still draining. Garbage collection is paused, and orphaned volumes are not being cleaned up. The kube-controller-manager runs these reconciliation loops, and in an HA cluster only the leader performs work. When leader election fails, the controller-manager exits, and the control plane stops acting on desired state. Existing workloads keep running, but nothing new is managed.

The kube-controller-manager coordinates through a Lease object in the coordination.k8s.io API group. The leader must renew the lease before --leader-elect-renew-deadline (default 10 seconds) elapses. The lease itself expires after --leader-elect-lease-duration (default 15 seconds). Renewal is attempted every --leader-elect-retry-period (default 2 seconds). If a write to etcd is too slow, if the API server is saturated, if RBAC is stripped, or if the election timing is misconfigured, the leader loses the lock, logs leaderelection lost, and exits. During the gap, no instance holds a valid lease, so controllers stop reconciling.

What this means

In an HA cluster, kube-controller-manager instances use the Lease named kube-controller-manager in the kube-system namespace. If the leader cannot renew, typically because a Lease update is blocked on slow etcd fsync or API server saturation, it logs:

failed to renew lease kube-system/kube-controller-manager: failed to tryAcquireOrRenew context deadline exceeded

It then logs leaderelection lost and exits. The container restarts and re-contests. This is intentional: the leader prefers to die rather than continue with a potentially stale view. Any transient write latency can trigger a full controller restart and a reconciliation pause.

A related failure mode is the leaderless window after a clean shutdown. Because the kube-controller-manager does not release the lease on exit, the old lease persists for its full TTL, typically ~15 seconds by default. During that window no leader is active even if another instance is healthy.

flowchart TD
    A[etcd disk latency spike] --> B[API server mutating latency rises]
    B --> C[Lease PUT request times out]
    C --> D[Leader fails to renew before deadline]
    D --> E[Process exits with leaderelection lost]
    E --> F[No valid lease for ~15s TTL window]
    F --> G[Controllers stop reconciling]

Common causes

Cause	What it looks like	First thing to check
etcd or API server latency	`context deadline exceeded` in logs during lease renewal; elevated etcd fsync or API mutating latency	`etcd_disk_wal_fsync_duration_seconds` and API server p99 latency
Lease timing misconfiguration	Premature leadership loss under light load because the renew deadline is too close to the lease duration	Pod spec for `--leader-elect-lease-duration` and `--leader-elect-renew-deadline` values
RBAC denied on Lease operations	`Forbidden` errors in controller-manager or audit logs when updating the Lease	Default ClusterRole `system:kube-controller-manager` permissions on `leases`
Retrieving-lock timeout cascade	After one failed acquisition, the controller-manager cannot recover and enters CrashLoopBackOff	Pod restart count and logs for `timed out waiting for the condition`
Leaderless window after clean exit	A ~15 second gap in reconciliation immediately after a rolling restart or cordon of the leader	Lease `renewTime` relative to the event, compared against `leaseDurationSeconds`

Quick checks

# Check the current Lease holder and freshness
kubectl get lease kube-controller-manager -n kube-system -o yaml

# Check controller-manager pod health and restart count
kubectl get pods -n kube-system -l component=kube-controller-manager

# Check logs for leader election failures
kubectl logs -n kube-system kube-controller-manager-<node> | grep -iE "leaderelection|renew lease|failed to acquire"

# Check etcd WAL fsync latency (stacked or self-managed etcd)
curl -s http://localhost:2379/metrics | grep '^etcd_disk_wal_fsync_duration_seconds'

# Check API server mutating latency (lease renewals use PUT)
kubectl get --raw /metrics | grep 'apiserver_request_duration_seconds' | grep -E 'verb="(POST|PUT)"'

# Check default controller-manager ClusterRole for lease permissions
kubectl get clusterrole system:kube-controller-manager -o yaml | grep -C 3 leases

# Check configured leader-election flags
kubectl get pod -n kube-system -l component=kube-controller-manager -o yaml | grep -E 'leader-elect-lease-duration|leader-elect-renew-deadline'

How to diagnose it

Confirm the Lease state. Run kubectl get lease kube-controller-manager -n kube-system -o yaml. Check holderIdentity to see which instance is the leader. Check whether renewTime is within leaseDurationSeconds of the current time, allowing for clock skew. If the lease has not been renewed within its TTL, there is no active leader and reconciliation is stopped.
Read the controller-manager logs. Look for failed to renew lease, leaderelection lost, or timed out waiting for the condition. The exact message separates a renewal timeout from an acquisition failure. Renewal timeouts point to API server or etcd latency; acquisition timeouts point to a stuck lock or extreme latency.
Check etcd and API server latency. Lease renewals are synchronous writes through the API server to etcd. If etcd WAL fsync p99 is above 100 ms or API server mutating latency is elevated, the renewal will timeout. See Kubernetes API server etcd latency: detection and cascading failures.
Verify leader election flag margins. If the log shows premature loss even when the API server is healthy, compare --leader-elect-lease-duration and --leader-elect-renew-deadline. If they are set too close together, for example 25s and 20s, there is insufficient time for retries before expiration, causing repeated flapping.
Check for CrashLoopBackOff from lock retrieval failure. If the pod is restarting repeatedly and logs show timed out waiting for the condition on retrieving the lock, the controller-manager may not recover automatically until the underlying latency is resolved. This matches the behavior described in GitHub issue #117922.
Audit for RBAC denials. If logs or audit logs show 403 Forbidden on lease operations, verify that the controller-manager identity retains get, create, and update permissions on leases in kube-system.
Map the gap to a control plane event. If the failure began during a rolling update of the control plane nodes, the leaderless window from unclean lease release may be the cause. Expect a reconciliation gap equal to the full lease TTL, typically ~15 seconds by default, per leader transition until the old lease expires.

Metrics and signals to monitor

Signal	Why it matters	Warning sign
Lease object freshness	A stale lease means no leader is reconciling state	`renewTime` older than `leaseDurationSeconds` plus a clock skew margin
Controller-manager pod restarts	The leader exits on renewal failure	More than 1 restart in 5 minutes
etcd WAL fsync latency	Every lease renewal waits on etcd fsync	p99 > 100 ms sustained
API server mutating request latency	Lease updates are PUT operations through the API server	p99 > 500 ms sustained
Controller workqueue depth	If the leader was already falling behind, depth grows before the exit	Depth > 100 sustained for core queues
Audit log 403 rate on `leases`	RBAC denial blocks renewal	Any sustained 403 on `coordination.k8s.io/v1/namespaces/kube-system/leases`

Fixes

If the cause is etcd or API server latency

Fix the storage or API server bottleneck first. Reduce etcd disk I/O contention, defragment etcd during a maintenance window if needed, and ensure the API server is not saturated by admission webhooks or APF throttling. Do not restart the controller-manager until underlying latency drops. Restarting into the same slow environment will produce a CrashLoopBackOff.

If the cause is lease timing misconfiguration

Adjust the flags with safe margins. The renew deadline must be strictly less than the lease duration, with enough room for the retry period to fire multiple times. Avoid aggressive custom values. If you must change them, keep the renew deadline well under the lease duration and test under production load before rolling out.

If the cause is RBAC drift

Restore get, create, and update permissions on leases in the kube-system namespace for the controller-manager identity. Restart the controller-manager after fixing the Role or ClusterRoleBinding.

If the cause is a retrieving-lock timeout cascade

If the controller-manager is in CrashLoopBackOff after repeated acquisition timeouts, resolve the root latency first, then restart the controller-manager instance. Simply restarting without fixing API server or etcd latency will not help because the new process will hit the same timeout.

If the cause is a leaderless window during maintenance

There is no user-configurable fix for lease release on clean exit in current stable releases. The ControllerManagerReleaseLeaderElectionLockOnExit feature gate exists but remains alpha and defaults to off. Run multiple controller-manager instances so failover is possible, and schedule control plane restarts during periods of low cluster mutation. The gap equals the full lease TTL, typically ~15 seconds by default.

Prevention

Do not tune --leader-elect-lease-duration and --leader-elect-renew-deadline to values that leave no headroom. The defaults exist for a reason.
Treat etcd disk latency and API server mutating latency as leading indicators for control plane health. Alert on them before controller-manager restarts begin.
Monitor controller-manager pod restart count in kube-system as a binary signal of leader distress.
Protect the controller-manager RBAC bindings from accidental changes during cluster upgrades or policy audits.
Maintain HA with at least two controller-manager instances so a single leader exit does not leave the cluster entirely without reconciliation.

How Netdata helps

Correlate etcd WAL fsync latency with controller-manager pod restarts in kube-system to identify the cascade.
Track API server request latency and error rates to catch saturation before lease renewals fail.
Monitor control plane node disk I/O and CPU to surface resource pressure that slows etcd.
Alert on sustained increases in controller workqueue depth as an early sign that the leader is falling behind.
Surface audit log anomalies, including 403 responses on Lease operations, for RBAC-related leader election failures.

For etcd latency root cause analysis, see Kubernetes API server etcd latency: detection and cascading failures.
For API server saturation and APF queueing, see Kubernetes API server rate limiting: APF priority levels and starvation.
For general API server slowness, see Kubernetes API server slow or unresponsive: causes and fixes.
For pod restart patterns, see Kubernetes pod CrashLoopBackOff: causes, diagnosis, and fixes.
For control plane monitoring fundamentals, see Kubernetes monitoring checklist: the signals every production cluster needs.

The Netdata solution

Kubernetes monitoring with Netdata

Netdata monitors Kubernetes with per-second metrics across the control plane, nodes, and every pod, with ML anomaly detection and zero per-pod configuration. Correlate API-server and etcd latency, kubelet PLEG stalls, scheduling pressure, and OOMKills in one place.

See Kubernetes monitoring → Start monitoring free

Kubernetes controller-manager leader election failures

Kubernetes controller-manager leader election failures

What this means

Common causes

Quick checks

How to diagnose it

Metrics and signals to monitor

Fixes

If the cause is etcd or API server latency

If the cause is lease timing misconfiguration

If the cause is RBAC drift

If the cause is a retrieving-lock timeout cascade

If the cause is a leaderless window during maintenance

Prevention

How Netdata helps

Related guides

Kubernetes monitoring with Netdata