Kubernetes RBAC permission denied: detection and minimum-permission fix
A 403 Forbidden from the Kubernetes API server means the caller was authenticated but RBAC refused the action. In production, this appears as a Deployment stuck creating pods, a CI pipeline failing to patch a ConfigMap, a controller logging repeated forbidden errors, or an operator unable to finalize a custom resource. One missing verb on one resource in one namespace blocks the entire workflow.
RBAC denials are localized: they do not cascade like network partitions or etcd latency. They are easy to miss in aggregate monitoring but block the affected workload completely. This guide shows how to detect the exact principal, verb, resource, and namespace; reproduce the denial with kubectl auth can-i; and apply the minimum permission fix without resorting to cluster-admin.
What this means
The API server processes every request through authentication, authorization, and admission. A 403 means authentication succeeded but authorization failed.
RBAC evaluates the requested verb, resource, subresource, API group, and namespace against the rules in the bound Role or ClusterRole. If no rule matches, the API server returns:
pods is forbidden: User "system:serviceaccount:dev:my-sa" cannot create resource "pods" in API group "" in the namespace "prod"
The fix is to grant the minimum permission covering exactly that principal, verb, resource, API group, and namespace. It does not mean binding the principal to cluster-admin.
Common causes
| Cause | What it looks like | First thing to check |
|---|---|---|
| Missing RoleBinding or ClusterRoleBinding | Controller or pod logs show 403; kubectl returns forbidden | kubectl auth can-i <verb> <resource> --as=<principal> -n <ns> |
| Service account misidentified | Pod uses the wrong SA or --as omits the system:serviceaccount: prefix | kubectl get pod <pod> -o jsonpath='{.spec.serviceAccountName}' |
| Aggregated ClusterRole drift | Default aggregated roles like edit or admin may lack rules for newer resources | kubectl get clusterrole edit -o yaml and compare rules |
| GKE dual-layer IAM denial | 403 errors despite correct Kubernetes RBAC; GCP IAM is the second gate | Whether the caller has sufficient GCP IAM at the project/cluster level |
| EKS aws-auth mapping mismatch | IAM role assumed via STS but the aws-auth ConfigMap contains the unmapped ARN | kube-system/aws-auth ConfigMap for the mapped IAM ARN |
| Overuse of cluster-admin | Teams bind users to cluster-admin instead of namespace-scoped roles | `kubectl get clusterrolebinding -o json |
Quick checks
Use these commands to narrow down who is failing and what they need.
# Identify the current caller (requires SelfSubjectReview API)
kubectl auth whoami
# Check if a specific principal can perform an action
kubectl auth can-i create pods --as=system:serviceaccount:dev:my-sa -n prod
# List every effective permission for a principal in a namespace
kubectl auth can-i --list --as=system:serviceaccount:dev:my-sa -n prod
# Find all RoleBindings that reference a user or service account
kubectl get rolebindings --all-namespaces -o json | \
jq -r '.items[] | select(.subjects[]?.name == "my-sa") | "\(.metadata.namespace)/\(.metadata.name)"'
# Find all ClusterRoleBindings for a principal
kubectl get clusterrolebindings -o json | \
jq -r '.items[] | select(.subjects[]?.name == "my-user") | .metadata.name'
# Check API server 403 rate from metrics (if accessible)
kubectl get --raw /metrics | grep 'apiserver_request_total.*code="403"'
# Check audit logs for recent 403 responses
grep '"responseStatus":{"code":403}' /var/log/kubernetes/audit.log | \
jq -r '.user.username + " " + .verb + " " + .objectRef.resource' | sort | uniq -c | sort -rn
# Verify whether a service account token is automounted unnecessarily
kubectl get pod my-pod -o jsonpath='{.spec.automountServiceAccountToken}'
How to diagnose it
Follow this flow to move from symptom to root cause.
Extract the principal and action from the failure message. The 403 error string contains the username, verb, resource, API group, and namespace. Record these before doing anything else.
Reproduce with auth can-i. Run
kubectl auth can-i <verb> <resource> --as=<principal> -n <ns>. If it returnsno, you have reproduced the authorization failure. If it returnsyes, the principal might be using a different identity than you think, or the error came from an admission webhook.Verify the principal exists.
kubectl auth can-isilently evaluates permissions for any principal string, even if the service account does not exist. Confirm a service account withkubectl get serviceaccount my-sa -n my-ns. Users are managed outside the cluster; verify them against your identity provider.Check existing bindings. Search RoleBindings and ClusterRoleBindings for the principal. If no bindings exist, the principal has no permissions beyond default group memberships.
Inspect the referenced role. If a binding exists but the denial persists, dump the Role or ClusterRole rules. Look for the exact verb and resource combination. Remember that subresources such as
pods/exec,pods/log, andserviceaccounts/tokenrequire explicit rules.Check for aggregated role drift. If the principal is bound to a default aggregated role like
editoradmin, inspect the ClusterRole rules directly. Default roles are updated during upgrades and may lack rules for newer resources.Validate cloud provider IAM layers. On GKE, verify GCP IAM roles independently of Kubernetes RBAC. On EKS, verify the aws-auth ConfigMap maps the IAM role ARN correctly, stripping any
/assumed-role/<role-name>/session suffix.Apply the minimum fix and re-verify. Create a Role or ClusterRole with the exact verb and resource, bind it with a RoleBinding or ClusterRoleBinding, and rerun
kubectl auth can-ito confirmyes. If the principal is a workload, restart the pod to pick up a new projected token if needed.
flowchart TD
A[403 Forbidden in logs or kubectl] --> B[Extract principal, verb, resource, namespace]
B --> C[kubectl auth can-i --as=principal]
C -->|no| D[Check RoleBindings and ClusterRoleBindings]
C -->|yes| E[Check admission webhooks or cloud IAM]
D -->|missing| F[Create minimum Role and RoleBinding]
D -->|exists| G[Inspect role rules for exact verb/resource/subresource]
G -->|missing rule| H[Update role or create custom role]
E --> I[Verify GKE IAM or EKS aws-auth mapping]
F --> J[Re-run auth can-i to verify]
H --> J
I --> JMetrics and signals to monitor
| Signal | Why it matters | Warning sign |
|---|---|---|
apiserver_request_total{code="403"} | Tracks the rate of RBAC denials across the cluster | Sustained rate above baseline or spikes from known service accounts |
apiserver_unauthorized_requests_total | Counts authentication and authorization failures directly (Kubernetes 1.28+) | Any sustained increase indicates mass credential or permission issues |
| Audit log 403 patterns | Provides the exact principal, verb, and resource for every denial | New usernames or service accounts appearing in 403 lines |
| RBAC modification rate | Sudden increases in RoleBinding or ClusterRoleBinding creation may indicate privilege escalation or emergency over-permissioning | Bindings to cluster-admin outside of change windows |
| Self-subject access review rate | High volume of can-i or selfsubjectrulesreview calls may indicate reconnaissance or a compromised workload probing its permissions | Spikes from single service accounts |
| Controller workqueue depth | Controllers blocked by 403 will retry and accumulate workqueue depth | Depth growing for controllers that mutate resources |
Fixes
If the cause is a missing binding
Create a Role with the exact verbs and resources, then bind it to the principal with a RoleBinding for namespace-scoped access or a ClusterRoleBinding for cluster-scoped access. Do not bind to cluster-admin unless the principal must manage RBAC itself.
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
namespace: prod
name: pod-reader
rules:
- apiGroups: [""]
resources: ["pods"]
verbs: ["get", "list", "watch"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
namespace: prod
name: pod-reader-binding
subjects:
- kind: ServiceAccount
name: my-sa
namespace: dev
roleRef:
kind: Role
name: pod-reader
apiGroup: rbac.authorization.k8s.io
If the cause is aggregated ClusterRole drift
If a controller depended on implicit permissions through the edit or admin role, create a dedicated ClusterRole that grants exactly the missing resource verbs and bind it alongside the existing role. Do not edit built-in aggregated roles directly; they are reconciled by the API server.
If the cause is a cloud provider IAM layer
On GKE, grant the corresponding GCP IAM role (for example, roles/container.developer) in addition to Kubernetes RBAC. Note that roles/container.admin grants cluster-admin-equivalent access across all clusters in the project, effectively overriding namespace-scoped RBAC.
On EKS, ensure the aws-auth ConfigMap contains the base IAM role ARN without the assumed-role session suffix, and verify the IAM identity also has a Kubernetes RBAC binding.
If the cause is overly broad permissions
Audit ClusterRoleBindings to cluster-admin and group memberships in system:masters. Membership in system:masters bypasses all RBAC checks permanently and cannot be revoked through RBAC. Replace these with the default admin, edit, or view roles, or with custom roles that expose only the required verbs. Pay special attention to the privilege escalation verbs: escalate, bind, impersonate, serviceaccounts/token, and certificatesigningrequests/approve.
Prevention
Validate permissions in CI/CD. Run kubectl auth can-i against a dry-run cluster or a staging namespace before deploying workloads that use new service accounts. This catches missing permissions before they reach production.
Set automountServiceAccountToken to false by default. Applications that do not need the Kubernetes API should not receive a mounted token. Explicitly opt in only for workloads that need it.
Monitor RBAC change rate. Alert on unexpected ClusterRoleBinding creations, especially to cluster-admin. Treat RBAC modifications as security events.
Use LimitRanges or admission policies to enforce resource requests. Quota exhaustion can block workloads with symptoms that resemble permission issues.
Test runbooks against a real cluster. Verify that your kubectl auth can-i commands and binding templates work with your actual identity provider, whether certificates, OIDC, or cloud IAM.
How Netdata helps
Netdata collects the API server and workload signals relevant to RBAC denials:
- Correlate
apiserver_request_total403 spikes with controller or service account activity. - Monitor API server latency alongside RBAC change events to detect authorization evaluation overhead.
- Track etcd write latency to distinguish RBAC issues from control plane saturation.
- Watch pod restart loops caused by permission denied errors in init containers or sidecars via the Kubernetes collector.
- Alert on anomalous API request patterns, such as a sudden increase in
selfsubjectaccessreviewsfrom a single namespace.
Related guides
- Kubernetes API server slow or unresponsive: causes and fixes
- Kubernetes API server rate limiting: APF priority levels and starvation
- Kubernetes Deployment rollout stuck: stalled rollouts and ready replicas
- Kubernetes DNS resolution failures inside pods
- Kubernetes API server etcd latency: detection and cascading failures






