$ guides / kubernetes / kubernetes-kubelet-volume-deadlock ▌

Operations Guides

Kubernetes kubelet volume deadlock: detection and recovery

When pods hang in ContainerCreating or Terminating while the node stays Ready=True, the kubelet volume manager is often the culprit. A blocked mount, unmount, attach, or detach operation consumes a goroutine in the volume manager’s finite pool. Once enough operations hang, the queue saturates. Subsequent pods needing volumes stall indefinitely, while pods without volumes start normally. The scheduler, seeing a healthy node, may keep placing volume-bound pods onto the saturated node, deepening the backlog. This guide covers how to confirm a volume deadlock, distinguish it from slow storage, and recover safely.

What this means

The kubelet volume manager reconciles desired and actual volume state using a finite pool of goroutines. Mount, unmount, attach, and detach are blocking operations. If one hangs – for example, an unreachable NFS server or a deadlocked CSI driver call – that goroutine is stuck until the kernel or driver returns. Once enough goroutines block, the volume manager cannot process new requests. Volume-dependent pods stall, but non-volume pods start normally. The node stays Ready because the main sync loop and PLEG are unaffected. The scheduler may continue placing volume-bound pods on the node, worsening the backlog.

flowchart TD
    A[Hung mount or unmount] --> B[Volume manager goroutine blocked]
    B --> C[Operation queue saturates]
    C --> D[New volume operations stall]
    D --> E[Pods stuck in ContainerCreating or Terminating]
    E --> F[Node Ready remains True]
    F --> G[Cluster assumes node is healthy]

Common causes

Cause	What it looks like	First thing to check
NFS or network storage unreachable	Pod stuck in `ContainerCreating`, no `FailedMount` event yet; mount command hangs on the node	Manual mount attempt from the node
CSI driver crash or deadlock	CSI driver pods not ready; terminating pods stuck waiting for unmount	CSI node driver pod health on the affected node
Cloud API throttling or outage	`FailedAttachVolume` events; multi-attach errors after node failure	`VolumeAttachment` objects and cloud provider status
Unclean detach from previous node	Volume still attached to a crashed or deleted node; new pod cannot mount	`VolumeAttachment` for the volume and previous node
Filesystem check during mount	Mount latency spikes after node reboot; pod events show attach succeeded but mount is slow	`dmesg` or filesystem journal on the node
Orphaned volume directories	Force-deleted pod leaves dangling data; subsequent mounts fail with path conflicts	Kubelet pod volume directories for deleted UIDs

Quick checks

Run these in order. Prefer read-only checks before any state-changing action.

# Check pods stuck in ContainerCreating or Terminating
kubectl get pods --all-namespaces --field-selector=status.phase=Pending -o wide
kubectl get pods --all-namespaces --field-selector=status.phase=Running | grep Terminating

# Check for volume-related events
kubectl get events --all-namespaces | grep -E "FailedMount|FailedAttachVolume|AttachVolume"

# Check VolumeAttachment status for stale attachments
kubectl get volumeattachments -o wide

# Check kubelet storage operation metrics
kubectl get --raw "/api/v1/nodes/<node>/proxy/metrics" | grep storage_operation_duration_seconds

# Check current mounts and look for duplicates or stale entries
mount | grep -E "/var/lib/kubelet/pods|kubernetes.io"

# Check CSI driver pods on the affected node
kubectl get pods -n kube-system -l app=<csi-driver-node> --field-selector=spec.nodeName=<node>

# Check kubelet logs for volume errors
journalctl -u kubelet --since "10 minutes ago" | grep -iE "volume|mount|attach|unmount"

# Check node disk pressure and inode usage
df -h /var/lib/kubelet
df -i /var/lib/kubelet

Destructive or disruptive commands (use with caution):

Restarting kubelet or CSI driver pods is state-changing and should only follow read-only diagnosis.
Deleting a VolumeAttachment can leave cloud resources in an inconsistent state if the underlying disk is still attached.

How to diagnose it

Identify the stuck pod and its node. Look for pods in ContainerCreating or Terminating longer than 5 minutes. Note the node name.
Check pod events for volume-specific errors. kubectl describe pod <name>. Look for FailedMount, FailedAttachVolume, or MountVolume.SetUp events. Absent events with a stuck phase indicate a hung operation, not a failed one.
Verify the node is otherwise healthy. kubectl get node <node>. If Ready=True and pods without volumes start normally, the volume manager is isolated from the rest of kubelet health.
Inspect storage operation latency. Query storage_operation_duration_seconds from the kubelet metrics endpoint. Operations exceeding 2 minutes are abnormal. If the metric shows no recent operations while new pods are pending, the operation likely has not started due to queue saturation.
Check for stale VolumeAttachments. kubectl get volumeattachments. If a volume is still attached to a deleted node or one not running the pod, the attach-detach controller waits for an unmount that will never complete, blocking reattachment.
Test storage connectivity from the node. For NFS, attempt a manual mount to /tmp/test. For block storage, verify the device path exists and is not busy. A hanging manual command points to the storage backend or network path as the root cause.
Check CSI driver health. If the volume is CSI-backed, verify the CSI node driver pod on the affected node is running and its socket is responsive. A crashed driver blocks both mount and unmount operations.
Look for orphaned volume directories. Check kubelet’s pod directory for UIDs of long-deleted pods. A dangling volume directory for a reused volume can cause mount setup to fail on path conflicts.
Determine if the volume manager is saturated. If multiple unrelated pods with different volumes are stuck on the same node, and non-volume pods start fine, the goroutine pool is likely saturated rather than a single volume being slow.

Metrics and signals to monitor

Signal	Why it matters	Warning sign
`storage_operation_duration_seconds`	Tracks mount, unmount, attach, and detach latency	p99 above 30 seconds or operations exceeding 2 minutes
`storage_operation_errors_total`	Count of failed storage operations	Sustained nonzero rate on a node
`kubelet_running_pods` vs desired	Reveals pods that never started due to blocked mounts	Growing gap on a specific node
VolumeAttachment status	Stale attachments block reattachment and new mounts	Attachment to a deleted or NotReady node
Pod startup latency	Volume mounts dominate startup time for stateful workloads	p99 above 60 seconds for pods with volumes
Node `DiskPressure`	Full disks prevent image pulls and volume expansions	Condition `True` or inode usage above 85 percent
Node `Ready` condition	Remains `True` during volume deadlocks	Used to distinguish volume manager failure from full node failure

Fixes

If the cause is a hung NFS or network mount

Do not restart kubelet first. A hung mount may be uninterruptible. Check if mount processes are in state D with ps -eo stat,comm | grep '^D'. If the NFS server recovers, the mount may complete on its own. If the server is permanently gone, force-unmount with umount -f or lazy-unmount with umount -l, then cordon the node and drain volume pods. Forceful unmounts risk data corruption for active writes.

If the cause is a CSI driver failure

Restart the CSI node driver pod on the affected node. If the driver is in CrashLoopBackOff, fix the driver configuration or resource limits first. If the driver is deadlocked, delete the pod. The DaemonSet will recreate it, which often restores the socket. Verify re-registration by checking kubelet logs for registration messages.

If the cause is a stale VolumeAttachment

If a volume is attached to a node that no longer exists or no longer needs it, and the cloud provider has not detached it, delete the stale VolumeAttachment object. Only do this after confirming the disk is no longer in use by the previous node. Forcing deletion may leave the cloud disk in an inconsistent state depending on the CSI driver implementation. After deletion, the attach-detach controller should reattach the volume to the correct node.

If the cause is orphaned volume directories

After confirming the pod is fully deleted from the API server and no processes are using the volume, remove the orphaned pod volume directory under the kubelet pods path. Do not delete directories for pods still visible in kubectl get pods, even if they are in Terminating.

If the cause is volume manager saturation

Cordon the node to stop new scheduling. Evict or delete pods stuck in ContainerCreating if they are safe to recreate elsewhere. If terminating pods hang due to unmount failures and the storage backend cannot recover quickly, restarting kubelet is a last resort. It clears the internal volume operation queue, but is disruptive: all pods on the node will be reconciled and existing containers may restart.

Prevention

Monitor storage operation latency. Alert on storage_operation_duration_seconds p99 above 30 seconds or any operation exceeding 2 minutes.
Avoid force-deleting pods with volumes. Use graceful deletion with adequate terminationGracePeriodSeconds to allow clean unmounts.
Ensure CSI drivers are well-provisioned. CSI node driver pods should have sufficient CPU and memory, and their DaemonSet should use a PodDisruptionBudget where appropriate.
Clean up VolumeAttachments after node deletions. When removing a node from the cluster, verify that its volumes are detached before deleting the node object.
Set mount timeout policies where possible. For NFS, use soft mount options with reasonable timeo values to prevent indefinite hangs, understanding the tradeoff of potential data inconsistency.
Monitor inode usage. Volume metadata and logs can exhaust inodes before disk space, so track df -i alongside df -h.

How Netdata helps

Netdata correlates storage_operation_duration_seconds with pod startup latency to isolate volume bottlenecks. It alerts on per-node volume operation latency spikes before queue saturation, tracks node-level disk pressure and inode exhaustion alongside kubelet volume metrics, visualizes the gap between desired and running pods per node to detect backlog, and surfaces kubelet error log patterns for mount and attach failures without manual log scanning.

The Netdata solution

Kubernetes monitoring with Netdata

Netdata monitors Kubernetes with per-second metrics across the control plane, nodes, and every pod, with ML anomaly detection and zero per-pod configuration. Correlate API-server and etcd latency, kubelet PLEG stalls, scheduling pressure, and OOMKills in one place.

See Kubernetes monitoring → Start monitoring free

Kubernetes kubelet volume deadlock: detection and recovery

Kubernetes kubelet volume deadlock: detection and recovery

What this means

Common causes

Quick checks

How to diagnose it

Metrics and signals to monitor

Fixes

If the cause is a hung NFS or network mount

If the cause is a CSI driver failure

If the cause is a stale VolumeAttachment

If the cause is orphaned volume directories

If the cause is volume manager saturation

Prevention

How Netdata helps

Related guides

Kubernetes monitoring with Netdata