Kubernetes API server memory pressure: OOM cycle and tuning

Your control plane is crash-looping. The kube-apiserver process climbs toward its container memory limit, the Go garbage collector stalls trying to reclaim space, and the kernel OOM killer terminates it. The replacement pod starts with cold caches; every connected client immediately re-lists watched resources, and memory spikes again before caches warm. The root cause is usually a mismatch between how the API server uses memory and how much headroom you have given it.

What this means

The Kubernetes API server keeps cluster state in an in-memory watch cache for each resource type. LIST responses are assembled in memory before transmission. Every watch connection holds buffers for event serialization. Memory is incompressible. As the API server approaches its container memory limit, the Go runtime spends more time in garbage collection, which shows up as latency spikes and request stalls. Eventually the kernel OOM killer targets the largest process on the node, which is usually kube-apiserver.

The crash is only the beginning. When the API server restarts, its watch caches are empty. Controllers, kubelets, and operators reconnect simultaneously and issue full LIST requests to re-sync state. These lists hit etcd directly because the watch cache is cold. The new instance buffers every response in memory while populating caches. If the memory limit is still too low, the replacement OOMs, creating a crash loop that can render the control plane unavailable.

The API server provides no built-in hard memory limit. It relies on the container runtime cgroup and then on the kernel OOM killer. There is no graceful degradation. The transition from slow GC to killed process is a cliff.

flowchart TD
    A[Watch cache growth or large LIST] --> B[Memory RSS climbs toward limit]
    B --> C[Go GC runs more frequently and longer]
    C --> D[API latency spikes and inflight requests back up]
    D --> E[Kernel OOM killer terminates kube-apiserver]
    E --> F[All watches disconnect]
    F --> G[Clients reconnect and re-list simultaneously]
    G --> H[Cold caches plus LIST storms spike memory 2-3x]
    H --> I[Replacement instance OOMs before stabilizing]
    I --> B

Common causes

CauseWhat it looks likeFirst thing to check
Container memory limit set too low for cluster sizeRSS flatlines near the limit, then OOM kills in dmesgprocess_resident_memory_bytes versus the container memory limit
CRD proliferation or object growth without memory scalingMemory grows steadily over days; watch cache item counts climbapiserver_storage_objects and whether new resource types were added
Post-restart re-list stormMemory spikes 2-3x immediately after an OOM kill or rolling restartLIST request rate (apiserver_request_total{verb="LIST"}) and container restart count
Memory leak in admission webhook or custom codeGoroutine count and heap grow together without cluster growthgo_goroutines and a pprof heap profile
Large un-paginated LIST requestsSudden memory spikes correlating with elevated LIST latencyapiserver_response_sizes and client request patterns

Quick checks

# Check API server resident memory
kubectl get --raw /metrics | grep ^process_resident_memory_bytes

# Check for recent OOM kills in the kernel log (requires node access)
dmesg -T | grep -i "oom\|killed process"

# Total object count per resource type
kubectl get --raw /metrics | grep ^apiserver_storage_objects

# Go GC pause duration (pressure indicator)
kubectl get --raw /metrics | grep ^go_gc_duration_seconds

# Goroutine count (leak indicator)
kubectl get --raw /metrics | grep ^go_goroutines

# API server container restart count
kubectl -n kube-system get pods -l component=kube-apiserver \
  -o jsonpath='{range .items[*]}{.metadata.name}{"\t"}{.status.containerStatuses[0].restartCount}{"\n"}{end}'

# LIST request rate (re-list storm detection)
kubectl get --raw /metrics | grep 'apiserver_request_total.*verb="LIST"'

# Capture heap profile (safe; may add brief load)
curl -sk https://localhost:6443/debug/pprof/heap > /tmp/apiserver-heap.pprof

How to diagnose it

  1. Confirm the OOM cycle. Look at process_resident_memory_bytes for a saw-tooth pattern where memory climbs to the limit, drops sharply, and repeats. Correlate the drops with OOM events in dmesg and with container restart counts. A restart count climbing with memory peaks confirms the kernel is killing the process, not a panic or clean shutdown. If restarts are not correlated with OOM, look for panics or liveness probe failures instead.

  2. Identify the memory consumer. Compare go_memstats_heap_inuse_bytes to process_resident_memory_bytes. If heap tracks closely with RSS, the watch cache or in-flight request buffers are the primary consumers. Check apiserver_storage_objects to see which resource types dominate. If heap is much smaller than RSS, suspect fragmentation or off-heap retention. A dominant resource type tells you to tune caching; a leak pattern tells you to capture profiles.

  3. Check for a re-list storm after restart. After an OOM kill or rolling restart, watch for a spike in apiserver_request_total{verb="LIST"} and elevated LIST latency. First LIST requests after a restart are served from etcd when the watch cache is cold. A sudden surge means clients are re-syncing en masse. If you see this, you need more warm-up headroom before you can lower the memory limit again.

  4. Check for goroutine leaks. If go_goroutines climbs steadily without a matching increase in watch connections or traffic, a leak is consuming memory via stack overhead and retained objects. This is more likely after a custom admission webhook or controller integration change. A flat goroutine count points to cache growth, not a leak.

  5. Audit recent changes. Review recent additions of MutatingWebhookConfiguration, ValidatingWebhookConfiguration, or CRDs. Each new webhook adds per-request allocation, and each new resource type adds a dedicated watch cache. Watch cache memory is retained per resource type and scales linearly with the number of API server replicas. If growth started after a deployment or CRD installation, you have found the trigger.

Metrics and signals to monitor

SignalWhy it mattersWarning sign
process_resident_memory_bytesResident memory is what the OOM killer sees> 80% of container limit
go_memstats_heap_inuse_bytesActive heap holds watch caches and deserialized objectsGrowing steadily over days
go_gc_duration_secondsGC pause time rises when the heap is crowdedp99 > 100ms
apiserver_storage_objectsObject count in etcdGrowth without corresponding cleanup
go_goroutinesGoroutine leaks retain stack memory and prevent collectionSustained growth without traffic increase
apiserver_request_total{verb="LIST"}Re-list storms after restart or cache overflowSudden spike after OOM or API server restart
Container restart countOOM kills appear as container restarts> 1 restart in 5 minutes

Fixes

If the cause is memory limits set too low

Increase the API server container memory limit immediately. This is the fastest way to break the crash loop. Set GOMEMLIMIT to approximately 90% of the container memory limit. This causes the Go runtime to collect garbage more aggressively as it approaches the limit, often preventing the kernel OOM killer from acting first.

If the cause is watch cache growth

Evaluate whether every resource type needs an in-memory watch cache. The API server exposes the --watch-cache-sizes flag, which lets you reduce or disable caching for specific resources. For example, setting --watch-cache-sizes pods#0 disables the pod watch cache entirely. The tradeoff is that uncached LIST and WATCH requests are served from etcd, which increases read latency. In HA deployments, remember that every API server instance maintains its own watch cache, so memory usage scales with the number of replicas.

If the cause is post-restart re-list storms

Ensure the memory limit can absorb the warm-up burst. Memory can spike 2-3x above the steady-state baseline during re-list storms. If the instance is crash-looping, raise the limit temporarily to allow caches to populate. Then investigate why clients are forcing full re-lists. LIST requests with no resourceVersion bypass the watch cache and connect directly to etcd. Slow or inactive watchers that accumulate events in the server-side buffer can also cause unbounded memory growth.

If the cause is a memory leak

Capture a pprof heap profile and goroutine dump via the /debug/pprof endpoint. If the leak correlates with a specific admission webhook or aggregated API server, remove or upgrade the offending component. If the leak is in core kube-apiserver, plan a patch upgrade. Do not simply raise the limit indefinitely without identifying the consumer.

If the cause is large un-paginated LIST requests

Audit clients, controllers, and CI pipelines for cluster-scoped LIST calls that do not use limit and continue. The API server assembles the full response in memory before transmission. A single LIST touching thousands of large objects can consume gigabytes of RAM temporarily. Enforce pagination at API gateways or via client configuration.

Prevention

  • Size for burst, not steady state. Allocate enough memory to survive 2-3x baseline RSS while caches warm after a restart.
  • Monitor the post-GC baseline. Track go_memstats_heap_inuse_bytes after collection. A rising baseline predicts OOM days in advance, while peak RSS alone is noisy.
  • Set GOMEMLIMIT. Configure it to roughly 90% of the container memory limit so the Go runtime can throttle memory before the kernel intervenes.
  • Track object counts per resource type. Use apiserver_storage_objects to detect accumulation of events, completed jobs, or orphaned CRD instances.
  • Review CRD and webhook sprawl. Each new resource type and admission webhook adds memory overhead that should trigger a capacity review.
  • Enforce pagination. Block or discourage unbounded LIST requests from automation and dashboards.
  • Test restart behavior. Measure informer sync time and memory consumption after a controlled API server restart to validate your headroom.

How Netdata helps

  • Correlates process_resident_memory_bytes, go_memstats_heap_inuse_bytes, and container restart events on the same timeline to confirm an OOM cycle.
  • Surfaces object counts per resource type so you can pinpoint which CRD or object type is driving growth.
  • Tracks go_gc_duration_seconds alongside API request latency to show when memory pressure degrades response times before the OOM kill.
  • Alerts on control plane container OOM kills and restart loops.
  • Visualizes LIST request rate spikes that precede or follow OOM events, connecting the crash to the re-list storm.