Kubernetes API server memory pressure: OOM cycle and tuning
Your control plane is crash-looping. The kube-apiserver process climbs toward its container memory limit, the Go garbage collector stalls trying to reclaim space, and the kernel OOM killer terminates it. The replacement pod starts with cold caches; every connected client immediately re-lists watched resources, and memory spikes again before caches warm. The root cause is usually a mismatch between how the API server uses memory and how much headroom you have given it.
What this means
The Kubernetes API server keeps cluster state in an in-memory watch cache for each resource type. LIST responses are assembled in memory before transmission. Every watch connection holds buffers for event serialization. Memory is incompressible. As the API server approaches its container memory limit, the Go runtime spends more time in garbage collection, which shows up as latency spikes and request stalls. Eventually the kernel OOM killer targets the largest process on the node, which is usually kube-apiserver.
The crash is only the beginning. When the API server restarts, its watch caches are empty. Controllers, kubelets, and operators reconnect simultaneously and issue full LIST requests to re-sync state. These lists hit etcd directly because the watch cache is cold. The new instance buffers every response in memory while populating caches. If the memory limit is still too low, the replacement OOMs, creating a crash loop that can render the control plane unavailable.
The API server provides no built-in hard memory limit. It relies on the container runtime cgroup and then on the kernel OOM killer. There is no graceful degradation. The transition from slow GC to killed process is a cliff.
flowchart TD
A[Watch cache growth or large LIST] --> B[Memory RSS climbs toward limit]
B --> C[Go GC runs more frequently and longer]
C --> D[API latency spikes and inflight requests back up]
D --> E[Kernel OOM killer terminates kube-apiserver]
E --> F[All watches disconnect]
F --> G[Clients reconnect and re-list simultaneously]
G --> H[Cold caches plus LIST storms spike memory 2-3x]
H --> I[Replacement instance OOMs before stabilizing]
I --> BCommon causes
| Cause | What it looks like | First thing to check |
|---|---|---|
| Container memory limit set too low for cluster size | RSS flatlines near the limit, then OOM kills in dmesg | process_resident_memory_bytes versus the container memory limit |
| CRD proliferation or object growth without memory scaling | Memory grows steadily over days; watch cache item counts climb | apiserver_storage_objects and whether new resource types were added |
| Post-restart re-list storm | Memory spikes 2-3x immediately after an OOM kill or rolling restart | LIST request rate (apiserver_request_total{verb="LIST"}) and container restart count |
| Memory leak in admission webhook or custom code | Goroutine count and heap grow together without cluster growth | go_goroutines and a pprof heap profile |
| Large un-paginated LIST requests | Sudden memory spikes correlating with elevated LIST latency | apiserver_response_sizes and client request patterns |
Quick checks
# Check API server resident memory
kubectl get --raw /metrics | grep ^process_resident_memory_bytes
# Check for recent OOM kills in the kernel log (requires node access)
dmesg -T | grep -i "oom\|killed process"
# Total object count per resource type
kubectl get --raw /metrics | grep ^apiserver_storage_objects
# Go GC pause duration (pressure indicator)
kubectl get --raw /metrics | grep ^go_gc_duration_seconds
# Goroutine count (leak indicator)
kubectl get --raw /metrics | grep ^go_goroutines
# API server container restart count
kubectl -n kube-system get pods -l component=kube-apiserver \
-o jsonpath='{range .items[*]}{.metadata.name}{"\t"}{.status.containerStatuses[0].restartCount}{"\n"}{end}'
# LIST request rate (re-list storm detection)
kubectl get --raw /metrics | grep 'apiserver_request_total.*verb="LIST"'
# Capture heap profile (safe; may add brief load)
curl -sk https://localhost:6443/debug/pprof/heap > /tmp/apiserver-heap.pprof
How to diagnose it
Confirm the OOM cycle. Look at
process_resident_memory_bytesfor a saw-tooth pattern where memory climbs to the limit, drops sharply, and repeats. Correlate the drops with OOM events indmesgand with container restart counts. A restart count climbing with memory peaks confirms the kernel is killing the process, not a panic or clean shutdown. If restarts are not correlated with OOM, look for panics or liveness probe failures instead.Identify the memory consumer. Compare
go_memstats_heap_inuse_bytestoprocess_resident_memory_bytes. If heap tracks closely with RSS, the watch cache or in-flight request buffers are the primary consumers. Checkapiserver_storage_objectsto see which resource types dominate. If heap is much smaller than RSS, suspect fragmentation or off-heap retention. A dominant resource type tells you to tune caching; a leak pattern tells you to capture profiles.Check for a re-list storm after restart. After an OOM kill or rolling restart, watch for a spike in
apiserver_request_total{verb="LIST"}and elevated LIST latency. First LIST requests after a restart are served from etcd when the watch cache is cold. A sudden surge means clients are re-syncing en masse. If you see this, you need more warm-up headroom before you can lower the memory limit again.Check for goroutine leaks. If
go_goroutinesclimbs steadily without a matching increase in watch connections or traffic, a leak is consuming memory via stack overhead and retained objects. This is more likely after a custom admission webhook or controller integration change. A flat goroutine count points to cache growth, not a leak.Audit recent changes. Review recent additions of
MutatingWebhookConfiguration,ValidatingWebhookConfiguration, or CRDs. Each new webhook adds per-request allocation, and each new resource type adds a dedicated watch cache. Watch cache memory is retained per resource type and scales linearly with the number of API server replicas. If growth started after a deployment or CRD installation, you have found the trigger.
Metrics and signals to monitor
| Signal | Why it matters | Warning sign |
|---|---|---|
process_resident_memory_bytes | Resident memory is what the OOM killer sees | > 80% of container limit |
go_memstats_heap_inuse_bytes | Active heap holds watch caches and deserialized objects | Growing steadily over days |
go_gc_duration_seconds | GC pause time rises when the heap is crowded | p99 > 100ms |
apiserver_storage_objects | Object count in etcd | Growth without corresponding cleanup |
go_goroutines | Goroutine leaks retain stack memory and prevent collection | Sustained growth without traffic increase |
apiserver_request_total{verb="LIST"} | Re-list storms after restart or cache overflow | Sudden spike after OOM or API server restart |
| Container restart count | OOM kills appear as container restarts | > 1 restart in 5 minutes |
Fixes
If the cause is memory limits set too low
Increase the API server container memory limit immediately. This is the fastest way to break the crash loop. Set GOMEMLIMIT to approximately 90% of the container memory limit. This causes the Go runtime to collect garbage more aggressively as it approaches the limit, often preventing the kernel OOM killer from acting first.
If the cause is watch cache growth
Evaluate whether every resource type needs an in-memory watch cache. The API server exposes the --watch-cache-sizes flag, which lets you reduce or disable caching for specific resources. For example, setting --watch-cache-sizes pods#0 disables the pod watch cache entirely. The tradeoff is that uncached LIST and WATCH requests are served from etcd, which increases read latency. In HA deployments, remember that every API server instance maintains its own watch cache, so memory usage scales with the number of replicas.
If the cause is post-restart re-list storms
Ensure the memory limit can absorb the warm-up burst. Memory can spike 2-3x above the steady-state baseline during re-list storms. If the instance is crash-looping, raise the limit temporarily to allow caches to populate. Then investigate why clients are forcing full re-lists. LIST requests with no resourceVersion bypass the watch cache and connect directly to etcd. Slow or inactive watchers that accumulate events in the server-side buffer can also cause unbounded memory growth.
If the cause is a memory leak
Capture a pprof heap profile and goroutine dump via the /debug/pprof endpoint. If the leak correlates with a specific admission webhook or aggregated API server, remove or upgrade the offending component. If the leak is in core kube-apiserver, plan a patch upgrade. Do not simply raise the limit indefinitely without identifying the consumer.
If the cause is large un-paginated LIST requests
Audit clients, controllers, and CI pipelines for cluster-scoped LIST calls that do not use limit and continue. The API server assembles the full response in memory before transmission. A single LIST touching thousands of large objects can consume gigabytes of RAM temporarily. Enforce pagination at API gateways or via client configuration.
Prevention
- Size for burst, not steady state. Allocate enough memory to survive 2-3x baseline RSS while caches warm after a restart.
- Monitor the post-GC baseline. Track
go_memstats_heap_inuse_bytesafter collection. A rising baseline predicts OOM days in advance, while peak RSS alone is noisy. - Set
GOMEMLIMIT. Configure it to roughly 90% of the container memory limit so the Go runtime can throttle memory before the kernel intervenes. - Track object counts per resource type. Use
apiserver_storage_objectsto detect accumulation of events, completed jobs, or orphaned CRD instances. - Review CRD and webhook sprawl. Each new resource type and admission webhook adds memory overhead that should trigger a capacity review.
- Enforce pagination. Block or discourage unbounded LIST requests from automation and dashboards.
- Test restart behavior. Measure informer sync time and memory consumption after a controlled API server restart to validate your headroom.
How Netdata helps
- Correlates
process_resident_memory_bytes,go_memstats_heap_inuse_bytes, and container restart events on the same timeline to confirm an OOM cycle. - Surfaces object counts per resource type so you can pinpoint which CRD or object type is driving growth.
- Tracks
go_gc_duration_secondsalongside API request latency to show when memory pressure degrades response times before the OOM kill. - Alerts on control plane container OOM kills and restart loops.
- Visualizes LIST request rate spikes that precede or follow OOM events, connecting the crash to the re-list storm.
Related guides
- See Kubernetes API server certificate rotation: detection and grace handling
- See Kubernetes API server etcd latency: detection and cascading failures
- See Kubernetes API server rate limiting: APF priority levels and starvation
- See Kubernetes API server slow or unresponsive: causes and fixes
- See Kubernetes eviction cascade: when one node failure takes down the cluster
- See Kubernetes kubelet memory leak: detection and OOM cycle






