Kubernetes pod exits immediately: how to diagnose it

When a pod shows Completed or Error with zero restarts, the container exited on its first run. The diagnostic evidence lives in termination metadata, not in a growing restart count. This is distinct from CrashLoopBackOff, where the kubelet has already applied exponential backoff after multiple restarts.

This guide covers how to distinguish a clean exit, an OOM kill, an application crash, and a configuration error using only the kubelet’s reported state and the previous container logs, plus which node-level and control-plane signals to check when the container produced no logs.

What this means

When a container terminates before the kubelet restarts it, the pod phase becomes Succeeded for exit code 0, or Failed for non-zero. A Deployment defaults to restartPolicy: Always, so even a clean exit triggers an immediate restart. Under Never or OnFailure, the pod stays terminal.

Immediately after the first termination, the RESTARTS counter is still 0. The Last State: Terminated block in kubectl describe pod captures the exit code and reason from that run. If the kubelet restarts the container, the first termination state shifts into lastState while currentState becomes Running or Waiting. Capture the first exit event before that happens.

Common causes

CauseWhat it looks likeFirst thing to check
One-shot command with restartPolicy: AlwaysPod exits cleanly (code 0) but immediately restartskubectl describe pod for Last State: Terminated, Reason: Completed, Exit Code: 0
OOMKilledExit code 137, often with no application logskubectl describe pod for Reason: OOMKilled; node MemoryPressure condition
Application startup crashExit code 1, stack trace or config error in logskubectl logs <pod> --previous
Missing secret, configmap, or envExit code 1, FileNotFoundError or similar in logsPod events and --previous logs
Init container failureMain containers never start; init exits with errorkubectl describe pod for init container state
Sub-second exit before log flushEmpty --previous logs, exit code presentStructured container status via kubectl get pod -o jsonpath
Node resource pressure evictionPod terminated by kubelet, status EvictedNode conditions and kubelet_evictions_total

Quick checks

Run these checks in order. They are read-only and safe.

# Check pod phase and restart count
kubectl get pod <pod-name> -o jsonpath='{.status.phase}{"\t"}{.status.containerStatuses[0].restartCount}{"\n"}'

# Check termination reason and exit code
kubectl describe pod <pod-name> | grep -A 5 "Last State:"

# Retrieve logs from the terminated container instance
kubectl logs <pod-name> --previous

# Extract structured container status including exit code and reason
kubectl get pod <pod-name> -o jsonpath='{.status.containerStatuses[0].lastState.terminated}' | jq

# Check node-level pressure conditions
kubectl describe node <node-name> | grep -E "MemoryPressure|DiskPressure|PIDPressure"

# Check for kernel OOM events on the node
dmesg | grep -i "out of memory"

# Inspect restartPolicy and container command
kubectl get pod <pod-name> -o yaml | grep -A 2 "restartPolicy"

# Check if pod was evicted
kubectl get pod <pod-name> -o jsonpath='{.status.reason}'

For a healthy long-running pod, expect Running phase, restartCount: 0, and no Last State: Terminated. Bad output depends on the cause: Exit Code: 0 with Completed suggests a one-shot job misconfigured with restartPolicy: Always; Exit Code: 137 with OOMKilled signals memory pressure; empty logs with a non-zero exit code means the process crashed before flushing buffers.

How to diagnose it

  1. Confirm a first-run exit. Run kubectl get pod <name>. If RESTARTS is 0 and the phase is Succeeded, Failed, or Error, the container exited on its first run. If the count is 1 or higher, the kubelet has already restarted it; treat that as a CrashLoopBackOff pattern instead.

  2. Capture termination metadata immediately. Run kubectl describe pod <name> and look under Last State: Terminated. Record the Reason (Completed, Error, OOMKilled), Exit Code, and Message. If the pod has already been restarted, this block may have shifted. Query it directly with kubectl get pod <name> -o jsonpath='{.status.containerStatuses[*].lastState.terminated}'.

  3. Interpret the exit code.

    • 0: The process exited cleanly. If the pod is restarting, check whether restartPolicy is Always when it should be Never or OnFailure.
    • 1: Generic application error. Look for stack traces or configuration failures in kubectl logs --previous.
    • 137 (128 + 9): The process received SIGKILL. In Kubernetes, this almost always means OOMKilled when it appears in container status. Cross-check with node memory pressure and container limits.
    • 143 (128 + 15): The process received SIGTERM. This is normal during graceful shutdown but unexpected on startup.
  4. Retrieve logs from the terminated instance. Run kubectl logs <pod-name> --previous. Empty output means the container exited before writing to stdout or stderr, or the runtime buffers were not flushed. Rely on termination metadata and node-level signals instead.

  5. Check for node-level pressure. Run kubectl describe node <node-name> and look at conditions. MemoryPressure=True means the kubelet is evicting pods. DiskPressure=True can prevent image pulls or log writes. Check kubelet_evictions_total metrics for the specific eviction signal.

  6. Inspect init container state. If the pod is stuck in Init:Error, the init container exited immediately. Run kubectl logs <pod-name> -c <init-container-name> to see why. Main containers will not start until all init containers complete successfully.

  7. Correlate with cluster events. Run kubectl get events --field-selector involvedObject.name=<pod-name>. Look for FailedScheduling, FailedMount, FailedCreatePodSandBox, or Killing events that preceded the exit. A Killing event from the kubelet indicates an eviction or termination signal, not an application crash.

  8. Compare the API server state to the original spec. Run kubectl get pod <pod-name> -o yaml and compare it against the manifest that created it. Silent mutations, defaulted fields, or injected sidecars can change the effective container command or environment.

flowchart TD
  A[Pod exits immediately
RESTARTS: 0] --> B{Exit code?} B -->|0| C[One-shot job with
restartPolicy: Always?] B -->|1| D[Application error
Check logs --previous] B -->|137| E[OOMKilled or SIGKILL
Check memory limits
and node pressure] B -->|143| F[SIGTERM on startup
Check preStop hooks
and grace period] C -->|Yes| G[Change restartPolicy
to Never or OnFailure] C -->|No| H[Check for expected
clean completion] D --> I[Fix code, config,
or missing secrets] E --> J[Raise limits or
reduce memory usage] F --> K[Adjust shutdown
behavior]

Metrics and signals to monitor

SignalWhy it mattersWarning sign
Pod phase distributionReveals pods terminating outside normal churnSustained increase in Failed or Succeeded pods that should be Running
Container restart countLagging indicator of prior exitsRestart count increasing for a stable workload
lastState.terminated.reasonDistinguishes OOM, error, and clean completionOOMKilled or Error in terminal state
Node MemoryPressure conditionTriggers kernel OOM or kubelet evictionMemoryPressure=True on production nodes
Container memory working set vs limitOOM occurs when usage exceeds the cgroup limitWorking set within 10% of the memory limit
kubelet_evictions_totalKubelet evicts pods to reclaim resourcesAny eviction event for non-best-effort workloads
kubelet_pleg_relist_duration_secondsSlow PLEG delays state reporting to the API serverp99 relist duration above 5 seconds
API server mutating request latencySlow admission or etcd delays status updatesp99 mutating latency above 1 second sustained

Fixes

If the cause is a one-shot job with restartPolicy: Always

Change the pod or Deployment restartPolicy to Never for one-shot jobs, or use a Kubernetes Job object which defaults to OnFailure. A container that exits cleanly with code 0 will still be restarted under Always.

If the cause is OOMKilled

Increase the container memory limit, or reduce the application’s memory footprint. For Java applications, ensure the max heap size leaves headroom for native memory and the container overhead. If the node itself is under MemoryPressure, scale the node pool or evict heavy best-effort pods.

If the cause is an application startup error

Read kubectl logs --previous to find the stack trace, missing file, or configuration failure. Verify that ConfigMaps, Secrets, and environment variables referenced in the pod spec exist and are mounted correctly. Fix the application code or container image.

If the cause is an init container failure

Run kubectl logs <pod> -c <init-container> to capture the init container’s output. Fix the initialization script, dependency, or command. Init container restarts are counted separately and can block the main pod indefinitely.

If the cause is node resource pressure

Warning: Disruptive. Cordon prevents new pods from scheduling to the node.

Cordon the node, then free disk space, remove unused images, or add nodes to the pool. Set resource requests and limits on all workloads so the scheduler and kubelet can make informed eviction decisions.

If the cause is missing log output

If the container exits before flushing logs, add a log flush call at application startup as a temporary debugging measure. Alternatively, write a termination message to the termination message path so kubectl describe pod surfaces it without relying on log buffers.

Prevention

  • Match restartPolicy to workload type. Use Always for long-running services, OnFailure for batch jobs, and Never for one-shot tasks.
  • Set memory requests and limits. This prevents the kernel OOM killer from targeting containers unpredictably and gives the scheduler the data it needs.
  • Use startup probes for slow-starting containers. Do not use liveness probes to catch startup failures; a failing liveness probe on a container that is still initializing causes unnecessary restarts.
  • Monitor pod phase distribution and container restart counts. Baseline these metrics per workload so you can detect a sudden shift to Failed or Succeeded.
  • Write termination messages. Configure terminationMessagePath and terminationMessagePolicy so application fatal errors are surfaced in kubectl describe pod even when logs are empty.
  • Include kubectl logs --previous in runbooks. Operators should run this immediately after detecting an unexpected exit, before the kubelet restarts the container and the evidence shifts.

How Netdata helps

  • Netdata collects kubelet metrics such as kubelet_running_pods, kubelet_container_start_duration_seconds, and kubelet_evictions_total to correlate pod exits with node-level events.
  • Per-container cgroup memory charts show working set growth approaching the limit before the OOM killer triggers.
  • Node condition alerts for MemoryPressure and DiskPressure trigger before the kubelet begins evicting pods.
  • API server latency monitoring detects slow admission webhooks or etcd disk latency that delays pod status updates and masks the true timing of a container exit.