Kubernetes kubelet volume deadlock: detection and recovery
When pods hang in ContainerCreating or Terminating while the node stays Ready=True, the kubelet volume manager is often the culprit. A blocked mount, unmount, attach, or detach operation consumes a goroutine in the volume manager’s finite pool. Once enough operations hang, the queue saturates. Subsequent pods needing volumes stall indefinitely, while pods without volumes start normally. The scheduler, seeing a healthy node, may keep placing volume-bound pods onto the saturated node, deepening the backlog. This guide covers how to confirm a volume deadlock, distinguish it from slow storage, and recover safely.
What this means
The kubelet volume manager reconciles desired and actual volume state using a finite pool of goroutines. Mount, unmount, attach, and detach are blocking operations. If one hangs – for example, an unreachable NFS server or a deadlocked CSI driver call – that goroutine is stuck until the kernel or driver returns. Once enough goroutines block, the volume manager cannot process new requests. Volume-dependent pods stall, but non-volume pods start normally. The node stays Ready because the main sync loop and PLEG are unaffected. The scheduler may continue placing volume-bound pods on the node, worsening the backlog.
flowchart TD
A[Hung mount or unmount] --> B[Volume manager goroutine blocked]
B --> C[Operation queue saturates]
C --> D[New volume operations stall]
D --> E[Pods stuck in ContainerCreating or Terminating]
E --> F[Node Ready remains True]
F --> G[Cluster assumes node is healthy]Common causes
| Cause | What it looks like | First thing to check |
|---|---|---|
| NFS or network storage unreachable | Pod stuck in ContainerCreating, no FailedMount event yet; mount command hangs on the node | Manual mount attempt from the node |
| CSI driver crash or deadlock | CSI driver pods not ready; terminating pods stuck waiting for unmount | CSI node driver pod health on the affected node |
| Cloud API throttling or outage | FailedAttachVolume events; multi-attach errors after node failure | VolumeAttachment objects and cloud provider status |
| Unclean detach from previous node | Volume still attached to a crashed or deleted node; new pod cannot mount | VolumeAttachment for the volume and previous node |
| Filesystem check during mount | Mount latency spikes after node reboot; pod events show attach succeeded but mount is slow | dmesg or filesystem journal on the node |
| Orphaned volume directories | Force-deleted pod leaves dangling data; subsequent mounts fail with path conflicts | Kubelet pod volume directories for deleted UIDs |
Quick checks
Run these in order. Prefer read-only checks before any state-changing action.
# Check pods stuck in ContainerCreating or Terminating
kubectl get pods --all-namespaces --field-selector=status.phase=Pending -o wide
kubectl get pods --all-namespaces --field-selector=status.phase=Running | grep Terminating
# Check for volume-related events
kubectl get events --all-namespaces | grep -E "FailedMount|FailedAttachVolume|AttachVolume"
# Check VolumeAttachment status for stale attachments
kubectl get volumeattachments -o wide
# Check kubelet storage operation metrics
kubectl get --raw "/api/v1/nodes/<node>/proxy/metrics" | grep storage_operation_duration_seconds
# Check current mounts and look for duplicates or stale entries
mount | grep -E "/var/lib/kubelet/pods|kubernetes.io"
# Check CSI driver pods on the affected node
kubectl get pods -n kube-system -l app=<csi-driver-node> --field-selector=spec.nodeName=<node>
# Check kubelet logs for volume errors
journalctl -u kubelet --since "10 minutes ago" | grep -iE "volume|mount|attach|unmount"
# Check node disk pressure and inode usage
df -h /var/lib/kubelet
df -i /var/lib/kubelet
Destructive or disruptive commands (use with caution):
- Restarting kubelet or CSI driver pods is state-changing and should only follow read-only diagnosis.
- Deleting a
VolumeAttachmentcan leave cloud resources in an inconsistent state if the underlying disk is still attached.
How to diagnose it
Identify the stuck pod and its node. Look for pods in
ContainerCreatingorTerminatinglonger than 5 minutes. Note the node name.Check pod events for volume-specific errors.
kubectl describe pod <name>. Look forFailedMount,FailedAttachVolume, orMountVolume.SetUpevents. Absent events with a stuck phase indicate a hung operation, not a failed one.Verify the node is otherwise healthy.
kubectl get node <node>. IfReady=Trueand pods without volumes start normally, the volume manager is isolated from the rest of kubelet health.Inspect storage operation latency. Query
storage_operation_duration_secondsfrom the kubelet metrics endpoint. Operations exceeding 2 minutes are abnormal. If the metric shows no recent operations while new pods are pending, the operation likely has not started due to queue saturation.Check for stale VolumeAttachments.
kubectl get volumeattachments. If a volume is still attached to a deleted node or one not running the pod, the attach-detach controller waits for an unmount that will never complete, blocking reattachment.Test storage connectivity from the node. For NFS, attempt a manual mount to
/tmp/test. For block storage, verify the device path exists and is not busy. A hanging manual command points to the storage backend or network path as the root cause.Check CSI driver health. If the volume is CSI-backed, verify the CSI node driver pod on the affected node is running and its socket is responsive. A crashed driver blocks both mount and unmount operations.
Look for orphaned volume directories. Check kubelet’s pod directory for UIDs of long-deleted pods. A dangling volume directory for a reused volume can cause mount setup to fail on path conflicts.
Determine if the volume manager is saturated. If multiple unrelated pods with different volumes are stuck on the same node, and non-volume pods start fine, the goroutine pool is likely saturated rather than a single volume being slow.
Metrics and signals to monitor
| Signal | Why it matters | Warning sign |
|---|---|---|
storage_operation_duration_seconds | Tracks mount, unmount, attach, and detach latency | p99 above 30 seconds or operations exceeding 2 minutes |
storage_operation_errors_total | Count of failed storage operations | Sustained nonzero rate on a node |
kubelet_running_pods vs desired | Reveals pods that never started due to blocked mounts | Growing gap on a specific node |
| VolumeAttachment status | Stale attachments block reattachment and new mounts | Attachment to a deleted or NotReady node |
| Pod startup latency | Volume mounts dominate startup time for stateful workloads | p99 above 60 seconds for pods with volumes |
Node DiskPressure | Full disks prevent image pulls and volume expansions | Condition True or inode usage above 85 percent |
Node Ready condition | Remains True during volume deadlocks | Used to distinguish volume manager failure from full node failure |
Fixes
If the cause is a hung NFS or network mount
Do not restart kubelet first. A hung mount may be uninterruptible. Check if mount processes are in state D with ps -eo stat,comm | grep '^D'. If the NFS server recovers, the mount may complete on its own. If the server is permanently gone, force-unmount with umount -f or lazy-unmount with umount -l, then cordon the node and drain volume pods. Forceful unmounts risk data corruption for active writes.
If the cause is a CSI driver failure
Restart the CSI node driver pod on the affected node. If the driver is in CrashLoopBackOff, fix the driver configuration or resource limits first. If the driver is deadlocked, delete the pod. The DaemonSet will recreate it, which often restores the socket. Verify re-registration by checking kubelet logs for registration messages.
If the cause is a stale VolumeAttachment
If a volume is attached to a node that no longer exists or no longer needs it, and the cloud provider has not detached it, delete the stale VolumeAttachment object. Only do this after confirming the disk is no longer in use by the previous node. Forcing deletion may leave the cloud disk in an inconsistent state depending on the CSI driver implementation. After deletion, the attach-detach controller should reattach the volume to the correct node.
If the cause is orphaned volume directories
After confirming the pod is fully deleted from the API server and no processes are using the volume, remove the orphaned pod volume directory under the kubelet pods path. Do not delete directories for pods still visible in kubectl get pods, even if they are in Terminating.
If the cause is volume manager saturation
Cordon the node to stop new scheduling. Evict or delete pods stuck in ContainerCreating if they are safe to recreate elsewhere. If terminating pods hang due to unmount failures and the storage backend cannot recover quickly, restarting kubelet is a last resort. It clears the internal volume operation queue, but is disruptive: all pods on the node will be reconciled and existing containers may restart.
Prevention
- Monitor storage operation latency. Alert on
storage_operation_duration_secondsp99 above 30 seconds or any operation exceeding 2 minutes. - Avoid force-deleting pods with volumes. Use graceful deletion with adequate
terminationGracePeriodSecondsto allow clean unmounts. - Ensure CSI drivers are well-provisioned. CSI node driver pods should have sufficient CPU and memory, and their DaemonSet should use a
PodDisruptionBudgetwhere appropriate. - Clean up VolumeAttachments after node deletions. When removing a node from the cluster, verify that its volumes are detached before deleting the node object.
- Set mount timeout policies where possible. For NFS, use
softmount options with reasonabletimeovalues to prevent indefinite hangs, understanding the tradeoff of potential data inconsistency. - Monitor inode usage. Volume metadata and logs can exhaust inodes before disk space, so track
df -ialongsidedf -h.
How Netdata helps
Netdata correlates storage_operation_duration_seconds with pod startup latency to isolate volume bottlenecks. It alerts on per-node volume operation latency spikes before queue saturation, tracks node-level disk pressure and inode exhaustion alongside kubelet volume metrics, visualizes the gap between desired and running pods per node to detect backlog, and surfaces kubelet error log patterns for mount and attach failures without manual log scanning.
Related guides
- Kubernetes CSI driver failures: detection, recovery, and version skew
- Kubernetes Deployment rollout stuck: stalled rollouts and ready replicas
- Kubernetes DaemonSet pods Pending: scheduling and tolerations
- Kubernetes DNS resolution failures inside pods
- Kubernetes conntrack exhaustion: dropped connections under load






