Kubernetes RBAC permission denied: detection and minimum-permission fix

A 403 Forbidden from the Kubernetes API server means the caller was authenticated but RBAC refused the action. In production, this appears as a Deployment stuck creating pods, a CI pipeline failing to patch a ConfigMap, a controller logging repeated forbidden errors, or an operator unable to finalize a custom resource. One missing verb on one resource in one namespace blocks the entire workflow.

RBAC denials are localized: they do not cascade like network partitions or etcd latency. They are easy to miss in aggregate monitoring but block the affected workload completely. This guide shows how to detect the exact principal, verb, resource, and namespace; reproduce the denial with kubectl auth can-i; and apply the minimum permission fix without resorting to cluster-admin.

What this means

The API server processes every request through authentication, authorization, and admission. A 403 means authentication succeeded but authorization failed.

RBAC evaluates the requested verb, resource, subresource, API group, and namespace against the rules in the bound Role or ClusterRole. If no rule matches, the API server returns:

pods is forbidden: User "system:serviceaccount:dev:my-sa" cannot create resource "pods" in API group "" in the namespace "prod"

The fix is to grant the minimum permission covering exactly that principal, verb, resource, API group, and namespace. It does not mean binding the principal to cluster-admin.

Common causes

CauseWhat it looks likeFirst thing to check
Missing RoleBinding or ClusterRoleBindingController or pod logs show 403; kubectl returns forbiddenkubectl auth can-i <verb> <resource> --as=<principal> -n <ns>
Service account misidentifiedPod uses the wrong SA or --as omits the system:serviceaccount: prefixkubectl get pod <pod> -o jsonpath='{.spec.serviceAccountName}'
Aggregated ClusterRole driftDefault aggregated roles like edit or admin may lack rules for newer resourceskubectl get clusterrole edit -o yaml and compare rules
GKE dual-layer IAM denial403 errors despite correct Kubernetes RBAC; GCP IAM is the second gateWhether the caller has sufficient GCP IAM at the project/cluster level
EKS aws-auth mapping mismatchIAM role assumed via STS but the aws-auth ConfigMap contains the unmapped ARNkube-system/aws-auth ConfigMap for the mapped IAM ARN
Overuse of cluster-adminTeams bind users to cluster-admin instead of namespace-scoped roles`kubectl get clusterrolebinding -o json

Quick checks

Use these commands to narrow down who is failing and what they need.

# Identify the current caller (requires SelfSubjectReview API)
kubectl auth whoami
# Check if a specific principal can perform an action
kubectl auth can-i create pods --as=system:serviceaccount:dev:my-sa -n prod
# List every effective permission for a principal in a namespace
kubectl auth can-i --list --as=system:serviceaccount:dev:my-sa -n prod
# Find all RoleBindings that reference a user or service account
kubectl get rolebindings --all-namespaces -o json | \
  jq -r '.items[] | select(.subjects[]?.name == "my-sa") | "\(.metadata.namespace)/\(.metadata.name)"'
# Find all ClusterRoleBindings for a principal
kubectl get clusterrolebindings -o json | \
  jq -r '.items[] | select(.subjects[]?.name == "my-user") | .metadata.name'
# Check API server 403 rate from metrics (if accessible)
kubectl get --raw /metrics | grep 'apiserver_request_total.*code="403"'
# Check audit logs for recent 403 responses
grep '"responseStatus":{"code":403}' /var/log/kubernetes/audit.log | \
  jq -r '.user.username + " " + .verb + " " + .objectRef.resource' | sort | uniq -c | sort -rn
# Verify whether a service account token is automounted unnecessarily
kubectl get pod my-pod -o jsonpath='{.spec.automountServiceAccountToken}'

How to diagnose it

Follow this flow to move from symptom to root cause.

  1. Extract the principal and action from the failure message. The 403 error string contains the username, verb, resource, API group, and namespace. Record these before doing anything else.

  2. Reproduce with auth can-i. Run kubectl auth can-i <verb> <resource> --as=<principal> -n <ns>. If it returns no, you have reproduced the authorization failure. If it returns yes, the principal might be using a different identity than you think, or the error came from an admission webhook.

  3. Verify the principal exists. kubectl auth can-i silently evaluates permissions for any principal string, even if the service account does not exist. Confirm a service account with kubectl get serviceaccount my-sa -n my-ns. Users are managed outside the cluster; verify them against your identity provider.

  4. Check existing bindings. Search RoleBindings and ClusterRoleBindings for the principal. If no bindings exist, the principal has no permissions beyond default group memberships.

  5. Inspect the referenced role. If a binding exists but the denial persists, dump the Role or ClusterRole rules. Look for the exact verb and resource combination. Remember that subresources such as pods/exec, pods/log, and serviceaccounts/token require explicit rules.

  6. Check for aggregated role drift. If the principal is bound to a default aggregated role like edit or admin, inspect the ClusterRole rules directly. Default roles are updated during upgrades and may lack rules for newer resources.

  7. Validate cloud provider IAM layers. On GKE, verify GCP IAM roles independently of Kubernetes RBAC. On EKS, verify the aws-auth ConfigMap maps the IAM role ARN correctly, stripping any /assumed-role/<role-name>/ session suffix.

  8. Apply the minimum fix and re-verify. Create a Role or ClusterRole with the exact verb and resource, bind it with a RoleBinding or ClusterRoleBinding, and rerun kubectl auth can-i to confirm yes. If the principal is a workload, restart the pod to pick up a new projected token if needed.

flowchart TD
    A[403 Forbidden in logs or kubectl] --> B[Extract principal, verb, resource, namespace]
    B --> C[kubectl auth can-i --as=principal]
    C -->|no| D[Check RoleBindings and ClusterRoleBindings]
    C -->|yes| E[Check admission webhooks or cloud IAM]
    D -->|missing| F[Create minimum Role and RoleBinding]
    D -->|exists| G[Inspect role rules for exact verb/resource/subresource]
    G -->|missing rule| H[Update role or create custom role]
    E --> I[Verify GKE IAM or EKS aws-auth mapping]
    F --> J[Re-run auth can-i to verify]
    H --> J
    I --> J

Metrics and signals to monitor

SignalWhy it mattersWarning sign
apiserver_request_total{code="403"}Tracks the rate of RBAC denials across the clusterSustained rate above baseline or spikes from known service accounts
apiserver_unauthorized_requests_totalCounts authentication and authorization failures directly (Kubernetes 1.28+)Any sustained increase indicates mass credential or permission issues
Audit log 403 patternsProvides the exact principal, verb, and resource for every denialNew usernames or service accounts appearing in 403 lines
RBAC modification rateSudden increases in RoleBinding or ClusterRoleBinding creation may indicate privilege escalation or emergency over-permissioningBindings to cluster-admin outside of change windows
Self-subject access review rateHigh volume of can-i or selfsubjectrulesreview calls may indicate reconnaissance or a compromised workload probing its permissionsSpikes from single service accounts
Controller workqueue depthControllers blocked by 403 will retry and accumulate workqueue depthDepth growing for controllers that mutate resources

Fixes

If the cause is a missing binding

Create a Role with the exact verbs and resources, then bind it to the principal with a RoleBinding for namespace-scoped access or a ClusterRoleBinding for cluster-scoped access. Do not bind to cluster-admin unless the principal must manage RBAC itself.

apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  namespace: prod
  name: pod-reader
rules:
- apiGroups: [""]
  resources: ["pods"]
  verbs: ["get", "list", "watch"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  namespace: prod
  name: pod-reader-binding
subjects:
- kind: ServiceAccount
  name: my-sa
  namespace: dev
roleRef:
  kind: Role
  name: pod-reader
  apiGroup: rbac.authorization.k8s.io

If the cause is aggregated ClusterRole drift

If a controller depended on implicit permissions through the edit or admin role, create a dedicated ClusterRole that grants exactly the missing resource verbs and bind it alongside the existing role. Do not edit built-in aggregated roles directly; they are reconciled by the API server.

If the cause is a cloud provider IAM layer

On GKE, grant the corresponding GCP IAM role (for example, roles/container.developer) in addition to Kubernetes RBAC. Note that roles/container.admin grants cluster-admin-equivalent access across all clusters in the project, effectively overriding namespace-scoped RBAC.

On EKS, ensure the aws-auth ConfigMap contains the base IAM role ARN without the assumed-role session suffix, and verify the IAM identity also has a Kubernetes RBAC binding.

If the cause is overly broad permissions

Audit ClusterRoleBindings to cluster-admin and group memberships in system:masters. Membership in system:masters bypasses all RBAC checks permanently and cannot be revoked through RBAC. Replace these with the default admin, edit, or view roles, or with custom roles that expose only the required verbs. Pay special attention to the privilege escalation verbs: escalate, bind, impersonate, serviceaccounts/token, and certificatesigningrequests/approve.

Prevention

Validate permissions in CI/CD. Run kubectl auth can-i against a dry-run cluster or a staging namespace before deploying workloads that use new service accounts. This catches missing permissions before they reach production.

Set automountServiceAccountToken to false by default. Applications that do not need the Kubernetes API should not receive a mounted token. Explicitly opt in only for workloads that need it.

Monitor RBAC change rate. Alert on unexpected ClusterRoleBinding creations, especially to cluster-admin. Treat RBAC modifications as security events.

Use LimitRanges or admission policies to enforce resource requests. Quota exhaustion can block workloads with symptoms that resemble permission issues.

Test runbooks against a real cluster. Verify that your kubectl auth can-i commands and binding templates work with your actual identity provider, whether certificates, OIDC, or cloud IAM.

How Netdata helps

Netdata collects the API server and workload signals relevant to RBAC denials:

  • Correlate apiserver_request_total 403 spikes with controller or service account activity.
  • Monitor API server latency alongside RBAC change events to detect authorization evaluation overhead.
  • Track etcd write latency to distinguish RBAC issues from control plane saturation.
  • Watch pod restart loops caused by permission denied errors in init containers or sidecars via the Kubernetes collector.
  • Alert on anomalous API request patterns, such as a sudden increase in selfsubjectaccessreviews from a single namespace.