Kubernetes API server certificate rotation: detection and grace handling

Kubernetes control plane certificates created by kubeadm expire after one year. The API server does not auto-rotate its serving certificate. When it expires, etcd rejects control plane connections, kubelets cannot authenticate, and the cluster becomes unreachable. The failure is sudden and total.

This guide covers kubeadm-managed clusters where you own the control plane. Distinguish between the API server serving certificate and the broader control plane bundle, detect expiration before the outage, and renew with minimal disruption.

What this means

The Kubernetes API server relies on TLS for every connection: clients verify its serving certificate, and the API server authenticates to etcd and kubelets using client certificates. In kubeadm-managed clusters, these certificates are valid for 365 days. There is no built-in daemon that renews the API server serving certificate before expiration.

When the API server serving certificate expires, every TLS handshake fails. kubectl commands break. Controller managers and schedulers lose API access. etcd may reject connections if the etcd client certificate has also expired. The cluster enters a hard down state requiring manual recovery on each control plane node.

The supported renewal path is kubeadm certs renew, followed by a restart of the affected static Pods so they pick up the new files on disk. The API server can reload some certificates from disk without a restart, but the canonical and safest approach remains the static Pod manifest cycle.

Common causes

CauseWhat it looks likeFirst thing to check
kubeadm control plane certificates approaching 1-year expirationkubeadm certs check-expiration shows < 30 days remainingCertificate dates on disk
Renewed certificates on disk but static Pods not restartedAPI server still reports old expiry after kubeadm certs renewStatic Pod manifest timestamps and process start time
Workstation kubeconfigs holding expired client certskubectl commands fail with 401 or TLS errors despite server cert being renewedEmbedded client certificate dates in kubeconfig
Clock skew across control plane nodesCertificates appear expired before their actual date, or “not yet valid” errorstimedatectl or ntpstat on each node
External CA mode preventing automatic signingKubelet client CSRs pile up unapproved, compounding the auth stormAbsence of ca.key in /etc/kubernetes/pki/

Quick checks

# Check all kubeadm-managed certificate expiration dates
kubeadm certs check-expiration
# Inspect the API server serving certificate directly
openssl x509 -in /etc/kubernetes/pki/apiserver.crt -noout -dates
# Check the etcd client certificate used by the API server
openssl x509 -in /etc/kubernetes/pki/apiserver-etcd-client.crt -noout -dates
# Verify kubelet client certificate expiry on a node
openssl x509 -in /var/lib/kubelet/pki/kubelet-client-current.pem -noout -dates
# Check kubelet and container runtime logs for TLS errors from the API server static Pod
journalctl -u kubelet --since "10 minutes ago" | grep -iE "apiserver|certificate|tls|x509"
# Check for certificate/key modulus mismatch
openssl x509 -noout -modulus -in /etc/kubernetes/pki/apiserver.crt | openssl md5
openssl rsa -noout -modulus -in /etc/kubernetes/pki/apiserver.key | openssl md5
# Check kubeconfig expiry embedded in admin.conf
grep client-certificate-data /etc/kubernetes/admin.conf | head -1 | awk '{print $2}' | base64 -d | openssl x509 -noout -dates
# Verify clock synchronization across control plane nodes
for node in cp1 cp2 cp3; do ssh $node timedatectl | grep "NTP synchronized"; done

How to diagnose it

  1. Confirm expiration with kubeadm. Run kubeadm certs check-expiration on a control plane node. This prints every control plane certificate, the residual time, and whether it is externally managed. If any critical certificate shows less than 7 days, treat it as an emergency.

  2. Identify which certificates are affected. The API server serving certificate (/etc/kubernetes/pki/apiserver.crt) is the most visible, but also check the etcd client certificate (apiserver-etcd-client.crt), the kubelet client certificate (apiserver-kubelet-client.crt), and the front-proxy client certificate. An expired etcd client certificate breaks the API server’s storage backend even if the serving certificate is still valid.

  3. Check if static Pods were restarted after any recent renewal. If someone ran kubeadm certs renew but did not restart the kube-apiserver static Pod, the process is still using the old certificate in memory. Compare the Pod creation time with the certificate file modification time.

  4. Look for TLS errors in kubelet and container runtime logs. Expired certificates produce x509: certificate has expired or is not yet valid or remote error: tls: bad certificate. Collect these logs to confirm which side of a connection is rejecting the handshake.

  5. Validate kubeconfig freshness. After renewing certificates, /etc/kubernetes/admin.conf on the control plane node is updated, but copies distributed to workstations or CI/CD systems contain the old client certificate. Check the embedded certificate dates in those files.

  6. Check for clock skew. Run timedatectl on every control plane and etcd node. A node that is more than a few minutes out of sync can cause certificates to appear invalid before or after their true lifetime.

  7. If the cluster is already down due to expiry, assess etcd quorum. If etcd peer or client certificates have also expired, etcd may refuse connections. In that case, you must renew etcd certificates and restart etcd static Pods before the API server can recover.

flowchart TD
    A[kubeadm certs check-expiration] --> B{Any cert < 30 days?}
    B -->|Yes| C[Identify affected cert type]
    C --> D[API server serving cert]
    C --> E[etcd client cert]
    C --> F[Front-proxy client cert]
    D --> G[Run kubeadm certs renew all]
    E --> G
    F --> G
    G --> H{Cluster still healthy?}
    H -->|Yes| I[Restart static Pods one node at a time]
    H -->|No| J[Renew etcd certs first, then API server]
    I --> K[Verify with openssl and kubectl]
    J --> K
    B -->|No| L[Schedule next check before 30 day threshold]

Metrics and signals to monitor

SignalWhy it mattersWarning sign
Days until API server certificate expirationThe primary leading indicatorLess than 30 days remaining on any control plane certificate
API server 401/403 error rate spikeExpired client certificates cause mass authentication failuresSudden increase in `apiserver_request_total{code=~“401
API server livez/readyz TLS failuresServing certificate expiry breaks all new connectionsNon-200 responses or TLS handshake timeouts on /livez or /readyz
etcd connectivity errors from API serveretcd client certificate expiry severs the storage backendetcd check failing in /readyz?verbose
Node NotReady rateExpired kubelet client certificates break node heartbeatsMultiple nodes transition to NotReady simultaneously
Static Pod restart count for kube-apiserverRestarts after renewal confirm new cert is loadedRestart count increasing without a corresponding rollout

Fixes

If certificates are approaching expiration but still valid

Run kubeadm certs renew all on each control plane node. This renews all certificates in /etc/kubernetes/pki/. After renewal, restart the static Pods so they load the new files.

Warning: The manifest swap below disrupts the control plane component for the duration of Pod termination and recreation. Perform this one node at a time in HA clusters to maintain quorum and API availability.

On each control plane node, move the manifest out of the kubelet’s watch directory, wait for the kubelet to stop the Pod, then move it back:

mv /etc/kubernetes/manifests/kube-apiserver.yaml /tmp/
sleep 25
mv /tmp/kube-apiserver.yaml /etc/kubernetes/manifests/

Repeat for kube-controller-manager.yaml and kube-scheduler.yaml if their certificates were also renewed.

If the API server certificate has already expired

The cluster is likely down. Perform the renewal procedure from the node itself or via an out-of-band session because kubectl will fail.

  1. Renew certificates with kubeadm certs renew all.
  2. If etcd certificates are also expired, renew them and restart etcd static Pods first.
  3. Restart the API server static Pod using the manifest swap procedure above.
  4. Once the API server responds, verify with kubectl get --raw=/healthz.

If kubeconfigs are stale after renewal

The client certificate embedded in /etc/kubernetes/admin.conf is renewed on disk, but any copy of that file on workstations or in automation pipelines retains the old certificate. Redistribute the updated admin.conf or generate new kubeconfigs for each consumer.

If you see repeated certificate reloads

A mismatched certificate and key modulus causes TLS handshake failures.

Verify the modulus matches:

openssl x509 -noout -modulus -in /etc/kubernetes/pki/apiserver.crt | openssl md5
openssl rsa -noout -modulus -in /etc/kubernetes/pki/apiserver.key | openssl md5

If the hashes differ, the certificate and key pair are inconsistent. Regenerate the affected certificate.

Prevention

  • Alert early. Page if any control plane certificate expires in less than 24 hours. Ticket if less than 7 days. Plan renewal if less than 30 days.
  • Automate renewal testing. Run kubeadm certs check-expiration weekly in a CI pipeline. Perform a dry-run renewal and static Pod restart quarterly in a staging cluster.
  • Use kubeadm upgrades. kubeadm upgrade apply automatically renews control plane certificates . Do not pass --certificate-renewal=false unless you have an external renewal process.
  • Keep NTP synchronized. Certificate validation is time-dependent. Ensure all control plane and etcd nodes use reliable time synchronization.
  • Track certificate consumers. Maintain an inventory of every system that holds a copy of /etc/kubernetes/admin.conf or uses a client certificate signed by the cluster CA. Rotate those copies immediately after control plane renewal.
  • Consider Cluster API if applicable. The KubeadmControlPlane provider can trigger a machine rollout when certificates approach expiration by setting .rolloutBefore.certificatesExpiryDays.

How Netdata helps

  • Monitor API server /livez and /readyz endpoint latency to correlate certificate expiry with the exact moment availability degrades.
  • Track apiserver_request_total by response code to surface the 401 spike that precedes total expiry.
  • Monitor static Pod restart counts to verify that kube-apiserver has restarted after a certificate renewal.
  • View etcd latency and API server latency together to distinguish etcd client certificate expiry from serving certificate expiry.