Kubernetes ResourceQuota exceeded: detection and remediation
A Deployment looks healthy in kubectl get deployment but the new ReplicaSet has zero pods. A Job is accepted but never creates a pod. A CI/CD pipeline fails with opaque 403 errors from the API server. These symptoms point to ResourceQuota exhaustion. Confirm the quota is the blocker, identify the exhausted resource, and fix it.
What this means
ResourceQuota is a namespace-scoped admission controller that enforces hard aggregate limits on resource consumption. When a tracked resource hits its hard limit, the API server rejects subsequent create requests with HTTP 403 Forbidden and a message containing exceeded quota: <quota-name>. Existing pods keep running; quota is enforced at admission time, not by terminating workloads.
Quota tracks requested resources, not actual usage. If a namespace quota covers requests.cpu or requests.memory, every new pod must specify a request for that resource. A pod with no request is rejected even if the quota has remaining capacity. Quota for pods counts only non-terminal pods (Pending and Running). Completed Job pods in Succeeded or Failed phase do not count against the pods quota, but Pending or Unknown pods do.
Common causes
| Cause | What it looks like | First thing to check |
|---|---|---|
| Rolling update with no surge headroom | Deployment rollout stalls; new ReplicaSet creates zero pods | kubectl get rs -n <namespace> and compare maxSurge to quota slack |
| Jobs silently retrying without creating pods | Job object exists but no pods appear; no visible errors in kubectl get jobs | kubectl describe job <name> -n <namespace> for FailedCreate events |
| Missing resource requests on new pods | Pod create rejected even though total usage seems low | Pod template for resources.requests |
| Operator or CI/CD object leaks | Quota consumed by Secrets, ConfigMaps, or PVCs rather than pods | Resource breakdown in kubectl describe resourcequota |
| Ephemeral storage or PVC exhaustion | PVC Pending after scheduling | kubectl get pvc -n <namespace> and requests.storage quota |
Quick checks
# Check all quotas and their current usage across namespaces
kubectl get resourcequota -A
# Describe a specific quota to see which resource is exhausted
kubectl describe resourcequota -n <namespace> <quota-name>
# Get structured used vs hard values
kubectl get resourcequota -n <namespace> -o json | \
jq '.items[] | {name: .metadata.name, hard: .status.hard, used: .status.used}'
# Find recent quota-related creation failures
kubectl get events -n <namespace> --field-selector reason=FailedCreate
# Check if a Deployment rollout is stuck
kubectl rollout status deployment/<name> -n <namespace>
# Check if a Job is silently retrying without creating pods
kubectl describe job <name> -n <namespace>
# Verify pods have requests set when cpu/memory quota exists
kubectl get pod <name> -n <namespace> -o jsonpath='{.spec.containers[*].resources.requests}'
How to diagnose it
Confirm the quota block. Look for
FailedCreateevents in the namespace. The event message containsexceeded quota:followed by the quota name and the exhausted resource. If events have expired, checkkubectl describe resourcequotadirectly.Identify the exhausted resource.
kubectl describe resourcequotashows a table with Resource, Used, and Hard. Any resource where Used equals Hard is the blocker. Common culprits:pods,requests.cpu,requests.memory,limits.cpu,limits.memory,requests.storage,persistentvolumeclaims,services,secrets, orconfigmaps.Check non-terminal pods. The
podsquota counts Pending and Running pods. Ifpodsquota is exhausted but few pods are Running, look for Pending pods stuck unschedulable or pods in Unknown phase due to node pressure.Inspect rolling-update behavior. If the quota is sized for steady-state capacity and a Deployment uses RollingUpdate, the new ReplicaSet cannot create pods until old pods terminate. The Deployment appears healthy but the rollout stalls. Check ReplicaSet pod counts and compare
maxSurgeagainst available quota headroom.Evaluate Job behavior. The Job object is accepted but its pods are rejected. The controller retries silently. Check the Job’s events for quota failures.
Verify LimitRange interaction. If a pod omits requests and the LimitRange does not provide defaults for a resource tracked by the quota, admission rejects the pod even if the quota is not exhausted.
flowchart TD
A[Pod create failed or rollout stalled] --> B{Events show exceeded quota?}
B -->|Yes| C[Describe ResourceQuota]
B -->|No| D[Check FailedCreate events]
D --> C
C --> E{Which resource is at hard limit?}
E -->|pods| F[Check maxSurge and Pending/Unknown pods]
E -->|cpu/memory| G[Check pod requests and LimitRange defaults]
E -->|storage/PVC| H[Check PVC claims and storage quota]
E -->|secrets/configmaps| I[Audit operator or CI/CD object creation]
F --> J[Adjust quota, reduce surge, or delete stuck pods and unused objects]
G --> J
H --> J
I --> JMetrics and signals to monitor
| Signal | Why it matters | Warning sign |
|---|---|---|
kube_resourcequota used vs hard | Tracks utilization of every quota resource type | Any resource > 80% of hard limit |
FailedCreate event rate | Direct evidence of admission blocking pod creation | Sustained nonzero rate in a namespace |
| Pending pods count | Quota-blocked pods may be stuck in Pending | Pending pods increasing while scheduler is healthy |
| Deployment ready vs desired replicas | Detects rollout stalls from lack of quota headroom | readyReplicas < spec.replicas for extended period |
| Job active pod count vs completions | Jobs silently back off when quota blocks pod creation | Active pods stuck at zero with completions desired |
| Namespace object count by type | Identifies which object type is consuming count quota | Rapid growth in ConfigMaps, Secrets, or PVCs |
Fixes
If the namespace is genuinely overcommitted
Delete unnecessary pods, scale down Deployments, or remove unused PVCs, Secrets, and ConfigMaps. If the workload is legitimate and persistent, raise the ResourceQuota hard value or move the workload to a less constrained namespace.
If the cause is rolling-update surge
Size the pods quota to steady_state_pods + maxSurge, or reduce maxSurge so the total pod count during rollout stays under quota. If you cannot change quota, switch the Deployment strategy to Recreate, accepting downtime during updates.
If the cause is non-terminal pods holding quota
Set ttlSecondsAfterFinished on Jobs so completed pods are garbage collected promptly. Delete stuck Pending or Unknown pods manually. Note that Succeeded and Failed pods do not count against pods quota; Pending and Unknown pods do.
If the cause is LimitRange interaction
Ensure every pod template specifies resources.requests for every resource tracked by the quota, or configure a LimitRange to supply defaults. Without defaults, admission rejects pods that omit requests even if the quota is not full.
If the cause is a transient admission race
During rolling updates, the quota controller’s informer cache may lag, causing status.used to drift and sporadic rejections. If intermittent quota-exceeded errors resolve within seconds, retry the operation. Persistent errors are not transient and require the fixes above.
Prevention
- Account for surge in quota sizing. Set
podsquota to(max replicas) + maxSurgefor the largest Deployment, plus headroom for DaemonSets and standalone pods. - Alert before the wall. Monitor
kube_resourcequotaand alert when any resource exceeds 80% of its hard limit. Do not wait for 100%. - Clean up completed Jobs. Use
ttlSecondsAfterFinishedon Jobs andfailedJobsHistoryLimiton CronJobs to prevent completed pods from lingering. - Audit namespace object growth. Secrets, ConfigMaps, and PVCs created by operators or CI/CD pipelines can silently exhaust count quotas.
- Use LimitRange defaults. Pair ResourceQuota with a LimitRange that sets default cpu/memory requests so pods are not rejected for omitting them.
How Netdata helps
Netdata surfaces kube_resourcequota metrics from kube-state-metrics. Use them to:
- Chart
usedagainsthardper namespace and resource type to spot approaching limits before failures occur. - Correlate spikes in
FailedCreateevent rates with quota utilization to confirm the bottleneck is admission, not scheduling or image pulls. - Overlay Deployment replica counts and pending pod counts to distinguish rollout stalls caused by quota exhaustion from application errors.
Related guides
- See Kubernetes Deployment rollout stuck: stalled rollouts and ready replicas for diagnosing rollout stalls that overlap with quota issues.
- See Kubernetes DaemonSet pods Pending: scheduling and tolerations when quota blocks DaemonSet scheduling.
- See Kubernetes API server slow or unresponsive: causes and fixes if quota failures correlate with elevated API latency.






