$ guides / kubernetes / kubernetes-api-server-etcd-latency ▌

Operations Guides

Kubernetes API server etcd latency: detection and cascading failures

When etcd slows down, the entire control plane slows with it. A few extra milliseconds on disk fsync turns into hung kubectl commands, backed-up controller queues, and eventually a cluster that cannot schedule pods or update endpoints. Detect the etcd latency cascade, confirm whether storage is the root cause, and break the feedback loop before the cluster becomes effectively read-only.

What this means

etcd serializes every Kubernetes mutation. Every API server write becomes a Raft proposal that must fsync to the WAL before etcd acknowledges it. When the disk under etcd is slow, every fsync waits longer. The API server holds mutating requests open until etcd responds. Requests pile up in the inflight queue. Once the queue hits the limit, the API server returns 429 Too Many Requests. Controllers that depend on writes (scheduler, replica set controller, and others) fall behind and retry. Retries generate more write load. The result is a feedback loop: slow disk -> slow etcd -> slow API server -> retry storm -> amplified etcd load.

The failure is asymmetric. Read operations served from the API server watch cache may still respond quickly, so kubectl get can look healthy while kubectl create or kubectl delete hangs. This asymmetry makes the cascade easy to misdiagnose as an API server problem rather than a storage problem.

Common causes

Cause	What it looks like	First thing to check
Disk I/O saturation on the etcd host	WAL fsync p99 climbing above 10ms; leader election storms	`iostat -x 1` on etcd nodes
Network-attached storage latency	Variable fsync spikes; cloud burst credit exhaustion	Disk type and burst balance
etcd database approaching quota	DB size near 80% of the default 2GB; writes fail with NOSPACE alarm	`etcdctl endpoint status --write-out=table`
etcd compaction or defragmentation	Periodic latency spikes aligned with maintenance windows	etcd logs for “compact” or “defrag”
Network partition between API server and etcd	Uniform mutating latency elevation; API server readyz etcd check fails	`etcdctl endpoint health` and peer RTT

Quick checks

Run these in order. All are read-only.

# Check etcd WAL fsync latency (etcd metrics endpoint)
curl -s http://localhost:2379/metrics | grep ^etcd_disk_wal_fsync_duration_seconds

# Check etcd backend commit latency
curl -s http://localhost:2379/metrics | grep ^etcd_disk_backend_commit_duration_seconds

# Check etcd cluster health and member status
ETCDCTL_API=3 etcdctl \
  --endpoints=https://127.0.0.1:2379 \
  --cacert=/etc/kubernetes/pki/etcd/ca.crt \
  --cert=/etc/kubernetes/pki/etcd/healthcheck-client.crt \
  --key=/etc/kubernetes/pki/etcd/healthcheck-client.key \
  endpoint health

# Check etcd DB size, leader status, and Raft index
ETCDCTL_API=3 etcdctl endpoint status --cluster -w table \
  --cacert=/etc/kubernetes/pki/etcd/ca.crt \
  --cert=/etc/kubernetes/pki/etcd/healthcheck-client.crt \
  --key=/etc/kubernetes/pki/etcd/healthcheck-client.key

# Check API server readyz etcd sub-check
kubectl get --raw='/readyz?verbose' | grep -A2 etcd

# Check disk I/O wait on the etcd host
iostat -x 1 5

# Check API server inflight mutating requests
kubectl get --raw='/metrics' | grep ^apiserver_current_inflight_requests

# Check 429 rejection rate
kubectl get --raw='/metrics' | grep 'apiserver_request_total' | grep 'code="429"'

# Check etcd leader changes
curl -s http://localhost:2379/metrics | grep ^etcd_server_leader_changes_seen_total

# Check pending Raft proposals
curl -s http://localhost:2379/metrics | grep ^etcd_server_proposals_pending

How to diagnose it

Confirm the cascade and find the root cause.

Confirm mutating API latency is elevated. Check apiserver_request_duration_seconds for POST, PUT, and PATCH verbs. If p99 is above 500ms sustained, the control plane is degrading. If it is above 1s, the cluster is in active failure.
Check etcd WAL fsync latency. Look at etcd_disk_wal_fsync_duration_seconds on the etcd metrics endpoint. In a healthy cluster, p99 is below 10ms. Above 100ms is critical. This is the root cause signal. If it is elevated, the problem is under etcd, not in the API server.
Check etcd leader stability. Look at etcd_server_leader_changes_seen_total. In a stable cluster, this should be near zero. If it is incrementing, the etcd leader is missing heartbeats because disk latency is exceeding the default 100ms heartbeat interval or the 1000ms election timeout.
Check etcd database size versus quota. Run etcdctl endpoint status --write-out=table. Compare DB SIZE to the configured --quota-backend-bytes (default 2GB). If the database is above 80% of quota, etcd is approaching the NOSPACE alarm, which makes writes progressively slower and eventually stops them entirely.
Check API server inflight requests and 429 rate. Look at apiserver_current_inflight_requests and apiserver_request_total{code="429"}. If inflight is climbing toward the limit (default 200 mutating, 400 read-only) and 429s are appearing, the API server is saturated because it is waiting on etcd.
Check disk I/O on the etcd host. Run iostat -x 1 and look for high %util, elevated await, or queue depth near the device limit. If disk utilization is near 100%, the storage subsystem is the bottleneck. If the disk is network-attached, check for burst credit exhaustion.
Distinguish from admission webhook slowdown. Check apiserver_admission_webhook_admission_duration_seconds . If webhook latency is normal while mutating API latency is high, etcd is the culprit. If webhook latency is also elevated, the bottleneck may be a slow webhook instead.

Metrics and signals to monitor

Signal	Why it matters	Warning sign
`etcd_disk_wal_fsync_duration_seconds`	WAL fsync is on the critical path of every write	p99 > 10ms trending upward
`etcd_disk_backend_commit_duration_seconds`	Backend commit affects read performance and compaction	p99 > 25ms sustained
`etcd_request_duration_seconds`	API server’s client-side view of etcd latency	p99 > 100ms for writes
`apiserver_request_duration_seconds` (mutating verbs)	End-to-end latency of writes through the API server	p99 > 500ms sustained
`etcd_server_leader_changes_seen_total`	Leader elections cause brief write outages	Any increase in a stable cluster
`etcd_mvcc_db_total_size_in_bytes`	Approaching quota causes write rejection	> 50% of `--quota-backend-bytes`
`apiserver_current_inflight_requests`	Indicates API server saturation	> 80% of configured limit
`apiserver_request_total{code="429"}`	Confirms APF or inflight saturation	Sustained rate above zero
`etcd_server_proposals_pending`	Rising value means Raft cannot reach consensus fast enough	Value increasing over time
`etcd_network_peer_round_trip_time_seconds`	High peer RTT causes leader instability	p99 > 1ms between peers

Fixes

If the cause is disk I/O saturation

Identify competing I/O workloads on the etcd host. If etcd is stacked with the API server or with logging agents, move etcd to dedicated SSD or NVMe storage. Do not run etcd on network-attached storage in production. If the disk is degraded, fail over to another etcd member if one is available.

If the cause is database size or fragmentation

Compact old revisions with etcdctl compact (requires a target revision; check etcdctl endpoint status) and then defragment one member at a time (followers first, leader last). Defragmentation blocks the member and can cause latency spikes. After freeing space, disarm any active NOSPACE alarm with etcdctl alarm disarm. Consider increasing --quota-backend-bytes if the cluster legitimately needs more than 2GB.

If the cause is periodic compaction or defragmentation

Compaction causes predictable latency spikes. Ensure the Kubernetes API server --etcd-compaction-interval and etcd’s own --auto-compaction-retention are aligned and not conflicting. Schedule defragmentation during maintenance windows, not during peak load.

If the cause is network latency or partition

Check etcd_network_peer_round_trip_time_seconds between members. If RTT is above 1ms, investigate the network path. Ensure etcd members are deployed with odd cardinality (3 or 5) so the cluster can tolerate member loss without losing quorum. If the API server cannot reach etcd, verify network policies, firewalls, and certificate validity on the etcd client paths.

Prevention

Monitor etcd_disk_wal_fsync_duration_seconds with the same urgency as API server latency. Alert when p99 exceeds 10ms.
Keep etcd database size below 50% of quota. Track the trend and schedule compaction before reaching 75%.
Run etcd on dedicated local SSD or NVMe. Never share the disk with workloads, logging, or the API server if stacked.
Alert on any etcd leader change in a stable cluster. Even one per hour indicates disk or network stress.
Ensure client certificate rotation is working. Expired etcd client or peer certificates can appear as latency or connectivity failures.
Size API server inflight limits and APF concurrency shares to leave headroom for bursts. Sustained utilization above 50% of inflight capacity should trigger capacity review.

How Netdata helps

Correlate etcd_disk_wal_fsync_duration_seconds with apiserver_request_duration_seconds on the same timeline to confirm the cascade.
Track apiserver_current_inflight_requests and 429 rates alongside etcd metrics to watch saturation build before an outage.
Monitor disk I/O wait, utilization, and queue depth on etcd nodes to distinguish disk saturation from application-level slowdown.
Alert on etcd leader changes and database size trends.

flowchart TD
    A[Slow disk I/O] --> B[etcd WAL fsync delay]
    B --> C[Leader misses heartbeat]
    C --> D[Raft leader election]
    D --> E[Brief write unavailability]
    E --> F[API server mutating requests timeout]
    F --> G[Inflight requests accumulate]
    G --> H[429 Too Many Requests]
    H --> I[Controllers retry]
    I --> J[Amplified write load on etcd]
    J --> B

The Netdata solution

Kubernetes monitoring with Netdata

Netdata monitors Kubernetes with per-second metrics across the control plane, nodes, and every pod, with ML anomaly detection and zero per-pod configuration. Correlate API-server and etcd latency, kubelet PLEG stalls, scheduling pressure, and OOMKills in one place.

See Kubernetes monitoring → Start monitoring free

Kubernetes API server etcd latency: detection and cascading failures

Kubernetes API server etcd latency: detection and cascading failures

What this means

Common causes

Quick checks

How to diagnose it

Metrics and signals to monitor

Fixes

If the cause is disk I/O saturation

If the cause is database size or fragmentation

If the cause is periodic compaction or defragmentation

If the cause is network latency or partition

Prevention

How Netdata helps

Related guides

Kubernetes monitoring with Netdata