Kubernetes kubelet goroutine leaks: detection and bisection

A kubelet goroutine leak is a slow-burn failure. The process stays up, the node often remains Ready for hours, and then PLEG timeouts start, sync loops lag, and the kubelet is eventually OOM-killed or unresponsive. By the time the node flips to NotReady, the original leak signature is usually obscured by secondary symptoms.

What this means

The kubelet spawns goroutines for pod workers, probes, API server watches, volume operations, PLEG relists, and CRI calls. In a healthy node, goroutine count correlates with pod density and returns to a stable floor when churn stops. A leak means goroutines are created but never exit, causing three cascading effects:

  1. Memory growth. Each goroutine consumes stack memory. At scale the aggregate RSS climb is material, and if the kubelet lacks headroom the kernel OOM killer terminates it.
  2. Scheduler pressure. Tens of thousands of leaked goroutines increase Go runtime overhead. The sync loop and PLEG slow down, pushing the node toward NotReady.
  3. Stalled subsystems. A leak tied to pod workers, probes, or watches can exhaust internal pools and block new operations even though the process is still running.

Common causes

CauseWhat it looks likeFirst thing to check
Stuck pod worker goroutinesGoroutine count grows with pod churn; containers stuck in Terminating or Unknownkubectl get pods --field-selector spec.nodeName=<node> for stuck workloads
Probe goroutine accumulationHigh density of exec or HTTP probes with short intervals; prober stacks dominate pprofProbe configuration and prober_probe_total rate
Watch or informer leaksGoroutines in k8s.io/client-go/tools/cache or transport layers after API server blipsAPI server connectivity and rest_client_requests_total errors
Cadvisor housekeeping accumulationGoroutines in housekeeping paths correlated with transient container failuresContainer failure events and kubelet_pleg_relist_duration_seconds
CRI streaming / SPDY leaksStacks in spdystream or wsstream after heavy kubectl exec or kubectl logs usageRecent interactive sessions and kubectl/kubelet version skew

Quick checks

Run these on the node, or use kubectl get --raw via the API server if the read-only port is disabled.

# Check current goroutine count from kubelet metrics
curl -s http://localhost:10255/metrics | grep kubelet_goroutines

# Get a pprof summary of goroutines
curl -s http://localhost:10255/debug/pprof/goroutine?debug=1 | head -100

# Dump full goroutine stacks for offline analysis; debug=2 includes wait reasons
curl -s http://localhost:10255/debug/pprof/goroutine?debug=2 > /tmp/kubelet-goroutines.txt

# Alternative: fetch pprof via the Kubernetes API node proxy
kubectl get --raw "/api/v1/nodes/<nodename>/proxy/debug/pprof/goroutine?debug=2"

# Count exited containers that may be triggering housekeeping leaks
crictl ps -a --state exited | wc -l

# Check probe load on the node
curl -s http://localhost:10255/metrics | grep prober_probe_total

# Check API client errors from kubelet to the API server
curl -s http://localhost:10255/metrics | grep rest_client_requests_total | grep -E 'code="(4|5)'

# Check kubelet resident memory
grep VmRSS /proc/$(pgrep kubelet)/status

How to diagnose it

  1. Establish a baseline. Record kubelet_goroutines at the current pod count. On a medium-density node, a floor of 100-500 goroutines is normal. If the count climbs while pod count is flat, or exceeds 1,000 without matching workload growth, treat it as a leak.

  2. Capture a goroutine profile. Pull /debug/pprof/goroutine?debug=2 from the kubelet. The dump lists every goroutine grouped by stack trace and wait reason. Look for hundreds of identical stacks in prober, watch, pod_worker, or housekeeping. If the same function accounts for a large fraction of total goroutines, you have isolated the leak family. Save this file before restarting anything.

  3. Correlate the dominant stack with recent events. A stack rooted in spdystream suggests a kubectl exec or portforward session was not torn down cleanly. A stack in housekeeping usually follows a burst of container failures. Stacks in client-go cache or transport code point to watch reconnections after API server disruptions. Match the leak onset to deployments, control plane restarts, or interactive sessions. For example, a sudden jump after a control plane upgrade points to a watch leak, while a steady climb during a batch job with frequent exec probes points to prober accumulation.

  4. Check for upstream bugs. Several cadvisor housekeeping and SPDY stream leaks have been documented and patched. Search the Kubernetes issue tracker for your kubelet version and the dominant stack signature. If you find a matching issue, confirm the fix version in the changelog before scheduling an upgrade. If the stack is novel, you are looking at a candidate for an upstream bug report.

  5. Validate the fix or workaround. Restarting the kubelet clears leaked goroutines immediately, but the leak will recur if the trigger persists. Warning: restarting kubelet is disruptive; the node may briefly flip to NotReady and pods may be rescheduled.

    After restart, watch kubelet_goroutines under the same workload pattern. If the count stabilizes, the trigger was external, such as a stuck pod or a brief API server partition. If it climbs again with the same stack signature, the trigger is intrinsic and requires a config change or an upstream fix.

flowchart TD
    A[Goroutine count above baseline] --> B{Is pod count stable?}
    B -->|No| C[Expected scaling; monitor rate]
    B -->|Yes| D[Capture pprof debug=2]
    D --> E{Dominant stack?}
    E -->|prober| F[Reduce exec probes and interval]
    E -->|watch or cache| G[Check API server connectivity]
    E -->|housekeeping| H[Fix container crash loop]
    E -->|spdystream or wsstream| I[Check kubectl exec usage and version skew]
    E -->|pod_worker| J[Find stuck pods and orphaned containers]
    F --> K[Restart kubelet and verify count stabilizes]
    G --> K
    H --> K
    I --> K
    J --> K

Metrics and signals to monitor

SignalWhy it mattersWarning sign
kubelet_goroutinesDirect indicator of internal concurrency and leaksSustained growth of more than 100 over baseline, or a total above 500 on medium-density nodes
kubelet_pleg_relist_duration_secondsLeaks slow kubelet internals and can be caused by runtime slownessp99 relist above 10 seconds and climbing
kubelet_sync_loop_duration_secondsLeaked goroutines compete for scheduler time and can stall reconciliationSustained duration above 30 seconds
prober_probe_totalHigh probe density creates goroutine pressure that leaks amplifyProbe rate more than 2× baseline without corresponding pod growth
rest_client_requests_totalErrors and watch churn spawn orphaned goroutinesSustained 4xx or 5xx responses from kubelet to the API server
process_resident_memory_bytes (kubelet)Goroutines consume stack memory; RSS growth confirms material impactRSS climbing in lockstep with goroutine count
Container restart count on nodeCrash loops trigger housekeeping goroutine accumulationMore than 10 restarts per minute for the same container
kubelet_runtime_operations_errors_totalCRI errors can strand goroutines waiting on runtime responsesSustained increase in container create or start errors

Fixes

If the cause is stuck pod workers

Look for pods stuck in Terminating or containers in an unknown state. Use crictl ps -a to identify orphaned containers, and crictl inspect on those containers to verify whether the runtime still holds the sandbox. If the runtime has already released the container but the API server still shows the pod, force-delete the pod object only as a last resort. Warning: force-deleting a pod while the kubelet is active can strand volumes and containers.

If the kubelet sync loop is blocked on the stuck worker, a kubelet restart is the fastest recovery, but capture the pprof dump first.

If the cause is probe overload

Reduce probe frequency and avoid sub-5-second intervals. Replace exec probes with httpGet where possible; exec probes fork processes and increase kubelet load. Increase initialDelaySeconds to prevent a startup storm of probe goroutines.

If the cause is watch or informer leaks

Stabilize API server connectivity first. Watches that reconnect in a loop spawn goroutines that may not be reclaimed if the connection is torn down uncleanly. Restart kubelet to clear the backlog. Warning: restarting kubelet is disruptive and may cause brief node NotReady status.

If the leak reproduces during normal API server health, upgrade kubelet to a version without the known informer leak.

If the cause is cadvisor or housekeeping

This pattern is tied to transient container failures. Fix the underlying application crash loop, then restart kubelet to clear the accumulated housekeeping goroutines. There is no runtime command to flush them individually.

If the cause is CRI streaming leaks

Limit automated kubectl exec and kubectl logs usage. Heavy interactive streaming can strand SPDY or websocket goroutines if the client or server version is outside the supported skew window. Keep kubectl and kubelet within plus or minus one minor version.

When to file an upstream bug

File a Kubernetes issue when the goroutine dump shows clear accumulation in kubelet-internal code, the leak reproduces on the latest supported patch release, and it is not explained by a stuck workload or a known CVE. Attach the pprof debug=2 output, the kubelet version, the container runtime version, and the trigger event.

Prevention

  • Baseline kubelet_goroutines per node density and alert on the rate of change rather than a static threshold.
  • Cap probe density. Avoid exec probes with short periods and limit the total number of active probes per node.
  • Monitor container restart rates. Crash loops are a leading trigger for housekeeping goroutine accumulation.
  • Keep kubelet, kubectl, and the control plane within supported version skew.
  • Scrape kubelet metrics continuously so a gradual 10 percent daily growth in goroutines surfaces before it becomes an outage. Review probe budgets during application deploys; a spike in prober_probe_total often precedes a leak by hours.

How Netdata helps

Netdata collects kubelet_goroutines, kubelet_pleg_relist_duration_seconds, kubelet memory, and container restart counts, which places leak growth, PLEG latency, and RSS pressure on one timeline. Anomaly detection on goroutine count catches gradual leaks that static thresholds miss, and the Kubernetes node view lets you compare counts across the fleet to isolate node-local triggers from cluster-wide regressions.