$ guides / kubernetes / kubernetes-resource-quota-exceeded ▌

Operations Guides

Kubernetes ResourceQuota exceeded: detection and remediation

A Deployment looks healthy in kubectl get deployment but the new ReplicaSet has zero pods. A Job is accepted but never creates a pod. A CI/CD pipeline fails with opaque 403 errors from the API server. These symptoms point to ResourceQuota exhaustion. Confirm the quota is the blocker, identify the exhausted resource, and fix it.

What this means

ResourceQuota is a namespace-scoped admission controller that enforces hard aggregate limits on resource consumption. When a tracked resource hits its hard limit, the API server rejects subsequent create requests with HTTP 403 Forbidden and a message containing exceeded quota: <quota-name>. Existing pods keep running; quota is enforced at admission time, not by terminating workloads.

Quota tracks requested resources, not actual usage. If a namespace quota covers requests.cpu or requests.memory, every new pod must specify a request for that resource. A pod with no request is rejected even if the quota has remaining capacity. Quota for pods counts only non-terminal pods (Pending and Running). Completed Job pods in Succeeded or Failed phase do not count against the pods quota, but Pending or Unknown pods do.

Common causes

Cause	What it looks like	First thing to check
Rolling update with no surge headroom	Deployment rollout stalls; new ReplicaSet creates zero pods	`kubectl get rs -n <namespace>` and compare maxSurge to quota slack
Jobs silently retrying without creating pods	Job object exists but no pods appear; no visible errors in `kubectl get jobs`	`kubectl describe job <name> -n <namespace>` for FailedCreate events
Missing resource requests on new pods	Pod create rejected even though total usage seems low	Pod template for `resources.requests`
Operator or CI/CD object leaks	Quota consumed by Secrets, ConfigMaps, or PVCs rather than pods	Resource breakdown in `kubectl describe resourcequota`
Ephemeral storage or PVC exhaustion	PVC Pending after scheduling	`kubectl get pvc -n <namespace>` and `requests.storage` quota

Quick checks

# Check all quotas and their current usage across namespaces
kubectl get resourcequota -A

# Describe a specific quota to see which resource is exhausted
kubectl describe resourcequota -n <namespace> <quota-name>

# Get structured used vs hard values
kubectl get resourcequota -n <namespace> -o json | \
  jq '.items[] | {name: .metadata.name, hard: .status.hard, used: .status.used}'

# Find recent quota-related creation failures
kubectl get events -n <namespace> --field-selector reason=FailedCreate

# Check if a Deployment rollout is stuck
kubectl rollout status deployment/<name> -n <namespace>

# Check if a Job is silently retrying without creating pods
kubectl describe job <name> -n <namespace>

# Verify pods have requests set when cpu/memory quota exists
kubectl get pod <name> -n <namespace> -o jsonpath='{.spec.containers[*].resources.requests}'

How to diagnose it

Confirm the quota block. Look for FailedCreate events in the namespace. The event message contains exceeded quota: followed by the quota name and the exhausted resource. If events have expired, check kubectl describe resourcequota directly.
Identify the exhausted resource. kubectl describe resourcequota shows a table with Resource, Used, and Hard. Any resource where Used equals Hard is the blocker. Common culprits: pods, requests.cpu, requests.memory, limits.cpu, limits.memory, requests.storage, persistentvolumeclaims, services, secrets, or configmaps.
Check non-terminal pods. The pods quota counts Pending and Running pods. If pods quota is exhausted but few pods are Running, look for Pending pods stuck unschedulable or pods in Unknown phase due to node pressure.
Inspect rolling-update behavior. If the quota is sized for steady-state capacity and a Deployment uses RollingUpdate, the new ReplicaSet cannot create pods until old pods terminate. The Deployment appears healthy but the rollout stalls. Check ReplicaSet pod counts and compare maxSurge against available quota headroom.
Evaluate Job behavior. The Job object is accepted but its pods are rejected. The controller retries silently. Check the Job’s events for quota failures.
Verify LimitRange interaction. If a pod omits requests and the LimitRange does not provide defaults for a resource tracked by the quota, admission rejects the pod even if the quota is not exhausted.

flowchart TD
    A[Pod create failed or rollout stalled] --> B{Events show exceeded quota?}
    B -->|Yes| C[Describe ResourceQuota]
    B -->|No| D[Check FailedCreate events]
    D --> C
    C --> E{Which resource is at hard limit?}
    E -->|pods| F[Check maxSurge and Pending/Unknown pods]
    E -->|cpu/memory| G[Check pod requests and LimitRange defaults]
    E -->|storage/PVC| H[Check PVC claims and storage quota]
    E -->|secrets/configmaps| I[Audit operator or CI/CD object creation]
    F --> J[Adjust quota, reduce surge, or delete stuck pods and unused objects]
    G --> J
    H --> J
    I --> J

Metrics and signals to monitor

Signal	Why it matters	Warning sign
`kube_resourcequota` used vs hard	Tracks utilization of every quota resource type	Any resource > 80% of hard limit
`FailedCreate` event rate	Direct evidence of admission blocking pod creation	Sustained nonzero rate in a namespace
Pending pods count	Quota-blocked pods may be stuck in Pending	Pending pods increasing while scheduler is healthy
Deployment ready vs desired replicas	Detects rollout stalls from lack of quota headroom	`readyReplicas < spec.replicas` for extended period
Job active pod count vs completions	Jobs silently back off when quota blocks pod creation	Active pods stuck at zero with completions desired
Namespace object count by type	Identifies which object type is consuming count quota	Rapid growth in ConfigMaps, Secrets, or PVCs

Fixes

If the namespace is genuinely overcommitted

Delete unnecessary pods, scale down Deployments, or remove unused PVCs, Secrets, and ConfigMaps. If the workload is legitimate and persistent, raise the ResourceQuota hard value or move the workload to a less constrained namespace.

If the cause is rolling-update surge

Size the pods quota to steady_state_pods + maxSurge, or reduce maxSurge so the total pod count during rollout stays under quota. If you cannot change quota, switch the Deployment strategy to Recreate, accepting downtime during updates.

If the cause is non-terminal pods holding quota

Set ttlSecondsAfterFinished on Jobs so completed pods are garbage collected promptly. Delete stuck Pending or Unknown pods manually. Note that Succeeded and Failed pods do not count against pods quota; Pending and Unknown pods do.

If the cause is LimitRange interaction

Ensure every pod template specifies resources.requests for every resource tracked by the quota, or configure a LimitRange to supply defaults. Without defaults, admission rejects pods that omit requests even if the quota is not full.

If the cause is a transient admission race

During rolling updates, the quota controller’s informer cache may lag, causing status.used to drift and sporadic rejections. If intermittent quota-exceeded errors resolve within seconds, retry the operation. Persistent errors are not transient and require the fixes above.

Prevention

Account for surge in quota sizing. Set pods quota to (max replicas) + maxSurge for the largest Deployment, plus headroom for DaemonSets and standalone pods.
Alert before the wall. Monitor kube_resourcequota and alert when any resource exceeds 80% of its hard limit. Do not wait for 100%.
Clean up completed Jobs. Use ttlSecondsAfterFinished on Jobs and failedJobsHistoryLimit on CronJobs to prevent completed pods from lingering.
Audit namespace object growth. Secrets, ConfigMaps, and PVCs created by operators or CI/CD pipelines can silently exhaust count quotas.
Use LimitRange defaults. Pair ResourceQuota with a LimitRange that sets default cpu/memory requests so pods are not rejected for omitting them.

How Netdata helps

Netdata surfaces kube_resourcequota metrics from kube-state-metrics. Use them to:

Chart used against hard per namespace and resource type to spot approaching limits before failures occur.
Correlate spikes in FailedCreate event rates with quota utilization to confirm the bottleneck is admission, not scheduling or image pulls.
Overlay Deployment replica counts and pending pod counts to distinguish rollout stalls caused by quota exhaustion from application errors.

See Kubernetes Deployment rollout stuck: stalled rollouts and ready replicas for diagnosing rollout stalls that overlap with quota issues.
See Kubernetes DaemonSet pods Pending: scheduling and tolerations when quota blocks DaemonSet scheduling.
See Kubernetes API server slow or unresponsive: causes and fixes if quota failures correlate with elevated API latency.

The Netdata solution

Kubernetes monitoring with Netdata

Netdata monitors Kubernetes with per-second metrics across the control plane, nodes, and every pod, with ML anomaly detection and zero per-pod configuration. Correlate API-server and etcd latency, kubelet PLEG stalls, scheduling pressure, and OOMKills in one place.

See Kubernetes monitoring → Start monitoring free

Kubernetes ResourceQuota exceeded: detection and remediation

Kubernetes ResourceQuota exceeded: detection and remediation

What this means

Common causes

Quick checks

How to diagnose it

Metrics and signals to monitor

Fixes

If the namespace is genuinely overcommitted

If the cause is rolling-update surge

If the cause is non-terminal pods holding quota

If the cause is LimitRange interaction

If the cause is a transient admission race

Prevention

How Netdata helps

Related guides

Kubernetes monitoring with Netdata