Kubernetes NetworkPolicy debugging: when traffic is denied silently
A pod that could reach its dependency yesterday now times out today. There is no TCP RST, no ICMP unreachable, and often no application log. The packet is dropped in the CNI data plane. If a policy change, namespace reorganization, or cluster upgrade preceded the outage, you are likely dealing with silent NetworkPolicy denial. This guide shows how to confirm it, find the rule or semantic gap responsible, and restore connectivity without opening the cluster.
What this means
Kubernetes NetworkPolicy is an isolation mechanism, not a firewall that judges traffic. A pod is non-isolated by default: all ingress and egress traffic is allowed. Once any NetworkPolicy with Ingress in policyTypes selects a pod, that pod becomes isolated for ingress. Only traffic matching an explicit allow rule in a policy that selects the pod is permitted. Reply traffic is implicitly allowed, but the initial connection must be explicitly permitted. The same logic applies to egress when Egress appears in policyTypes.
Effects of multiple policies are additive. Order does not matter. For a connection to succeed, both the egress policy on the source pod and the ingress policy on the destination pod must allow it.
When a deny-all NetworkPolicy is defined, it is only guaranteed to deny TCP, UDP, and SCTP connections. Behavior for other protocols, such as ICMP, is undefined and varies by CNI plugin. Cilium, for example, blocks ICMP unless explicitly permitted.
Because enforcement happens in the CNI data plane, the sender typically sees a connection timeout. This silence makes NetworkPolicy a common cause of “mystery” connectivity outages.
Common causes
| Cause | What it looks like | First thing to check |
|---|---|---|
| Missing DNS egress rule | Service discovery fails; curl by hostname hangs; no application error | Whether UDP/TCP port 53 to kube-dns is explicitly allowed |
| namespaceSelector targets unlabeled namespace | Cross-namespace traffic fails despite a policy “allowing namespace X” | kubectl get namespace <name> --show-labels |
Empty podSelector: {} scope confusion | Operator assumes {} grants cross-namespace access, but it only matches pods in the policy’s own namespace | Whether a namespaceSelector is present alongside the podSelector |
Omitted policyTypes | Egress rules exist but are ignored; traffic behavior does not match the manifest | The policyTypes field in the NetworkPolicy manifest |
| AWS VPC CNI port limit exceeded | Silent pod-to-pod failures after migrating from Calico to VPC CNI on EKS 1.30+ | Number of port entries per selector; consolidate with endPort |
| hostNetwork pod bypass | Policies appear to have no effect for specific workloads | Whether the affected pod uses hostNetwork: true |
| ICMP denied by Cilium | Ping fails between pods even though TCP/UDP on the same path works | Cilium-specific ICMP allow rules |
Quick checks
Use these checks to confirm silent NetworkPolicy denial.
# List all NetworkPolicies in the destination namespace
kubectl get networkpolicy -n <destination-ns> -o yaml
# Check which policies select the destination pod by its labels
kubectl get pods -n <destination-ns> --show-labels
# Then match against each policy's podSelector and namespaceSelector
# Test connectivity by pod IP to bypass Service DNAT
kubectl exec -n <source-ns> <source-pod> -- wget -qO- --timeout=5 http://<dest-pod-ip>:<port>
# Test DNS resolution from the source pod
kubectl exec -n <source-ns> <source-pod> -- nslookup <target-service>
# Verify namespace labels (namespaceSelector matches these, not pod labels)
kubectl get namespace <name> --show-labels
# For Calico: check Felix metrics for policy drops
kubectl exec -n kube-system <calico-node-pod> -- wget -qO- http://localhost:9091/metrics | grep felix_
# For Cilium: observe drops in real time
kubectl exec -n kube-system <cilium-pod> -- cilium monitor
What good looks like: Cross-namespace ingress policy should show a namespaceSelector with matching labels on the namespace object, and the destination pod must be selected by at least one policy that includes the source in its from rules. If no NetworkPolicy selects the destination pod, it is non-isolated and NetworkPolicy is not the cause.
How to diagnose it
Follow this flow to isolate the offending policy or CNI behavior.
Confirm the symptom is a silent drop. If application logs show “Connection refused,” the target port is not listening or a Service has no endpoints. If they show “NXDOMAIN,” the issue is DNS. A NetworkPolicy denial produces a timeout or hang with no response.
Determine if the destination pod is isolated. List all NetworkPolicies in the destination namespace. If any policy selects the destination pod via
podSelectorornamespaceSelector, the pod is isolated in the directions declared inpolicyTypes. If no policy selects it, look elsewhere.Verify ingress allows the source. For an isolated destination, inspect every policy that selects it. Check whether any
ingressrule permits the source. Remember thatfromrequires both the source pod labels and, if cross-namespace, the namespace labels to match. A barepodSelector: {}inside an ingress rule only matches pods in the same namespace as the policy.Verify egress allows the destination. Inspect policies in the source namespace. If the source is isolated for egress, check whether an
egressrule permits the destination IP, pod labels, namespace labels, or CIDR. A connection requires both sides to agree.Check the DNS egress trap. If the failure involves hostnames or Kubernetes Services, test DNS resolution from the source pod. Most default-deny or restrictive egress policies omit port 53. See the fix below.
Inspect namespace labels. If you use
namespaceSelectorin a rule, verify that the namespace object itself carries the expected labels. Most Kubernetes distributions do not label namespaces by default.Validate policyTypes. If a policy contains egress rules but
policyTypesomitsEgress, the CNI may ignore the egress rules entirely. Always declare both directions explicitly.Test CNI-specific behavior. If the above checks are correct but traffic still fails, verify your CNI. Flannel does not enforce NetworkPolicy. For hostNetwork pods, enforcement varies: some CNIs cannot distinguish hostNetwork traffic from node traffic and ignore selectors for those pods. In Cilium,
fromCIDR/toCIDRrules only match non-pod endpoints; pod-to-pod traffic must use label selectors.
flowchart TD
A[Pod cannot reach target] --> B{Connection refused or timeout?}
B -->|Timeout| C[Check NetworkPolicies selecting source and destination]
B -->|Refused| Z[Check Service endpoints and port binding]
C --> D{Destination isolated?}
D -->|No| E[Check CNI plugin enforcement capability]
D -->|Yes| F{Ingress allows source?}
F -->|No| G[Fix ingress rule or namespace labels]
F -->|Yes| H{Egress allows destination?}
H -->|No| I[Fix egress rule or CIDR scope]
H -->|Yes| J{Failure involves hostnames?}
J -->|Yes| K[Test DNS from source pod]
K -->|Fails| L[Add DNS egress rule for port 53]
J -->|No| M[Review CNI-specific behavior]Metrics and signals to monitor
| Signal | Why it matters | Warning sign |
|---|---|---|
| CNI plugin health (DaemonSet pod restarts) | Policy enforcement stops if the CNI agent crashes or is OOM killed | CNI pods restarting or stuck in CrashLoopBackOff |
| Pod-to-pod connectivity test failures | Direct confirmation of NetworkPolicy-like drops | Timeouts between known-healthy pods on specific nodes |
| DNS resolution latency/failures from workloads | The most common symptom of missing DNS egress rules | nslookup failures correlated with policy rollout |
felix_* metrics (Calico) | Felix programs the rules; elevated drop metrics confirm policy denial | Increasing felix_iptables_* or policy-related drop counters |
Cilium DROP_POLICY_DENIED events | Cilium annotates drops with a reason; this one confirms policy | cilium monitor output showing policy drops between source and dest |
| Cluster NetworkPolicy object count | Rapid growth increases collision risk and debugging surface | Sudden spikes in policy count without change management |
Fixes
If the cause is missing DNS egress
Add an explicit egress rule to your default-deny or restrictive policy:
- to:
- namespaceSelector:
matchLabels:
kubernetes.io/metadata.name: kube-system
podSelector:
matchLabels:
k8s-app: kube-dns
ports:
- protocol: UDP
port: 53
- protocol: TCP
port: 53
Allow both TCP and UDP. Some DNS queries use TCP.
If the cause is namespaceSelector mismatch
Apply the expected label to the namespace object itself, or change the policy to match existing labels. Do not assume namespace names are labels; selectors operate on metadata labels.
If the cause is omitted policyTypes
Explicitly declare both directions in every policy:
policyTypes:
- Ingress
- Egress
Omitting a direction leaves it unregulated, which can be either too permissive or cause the CNI to ignore rules in that direction.
If the cause is AWS VPC CNI port limits
AWS VPC CNI limits each protocol in each selector to 24 unique port combinations. Reduce the port list or use endPort to specify ranges. If you migrated from Calico, audit existing policies for large port lists.
If the cause is hostNetwork or CNI bypass
For hostNetwork pods, do not rely solely on NetworkPolicy for isolation. Add node-level firewall rules or run the workload as a normal pod. If you use Flannel, be aware that NetworkPolicy objects are accepted by the API but never enforced.
If the cause is ICMP under Cilium
Add an explicit ICMP allow rule, or use CiliumNetworkPolicy with icmps rules if ICMP is required for your operational health checks.
Prevention
- Always include DNS egress in any default-deny or restrictive egress policy. Service discovery depends on it, and its absence is the top cause of silent breakage.
- Validate namespace labels before deploying policies that rely on
namespaceSelector. Add labels to namespaces as part of namespace provisioning. - Declare
policyTypesexplicitly in every NetworkPolicy, even if the default behavior appears correct in testing. - Stage policies with real traffic before production. A policy that looks correct in a yaml linter can still deny critical control plane or sidecar traffic.
- Monitor CNI health alongside application metrics. If the CNI agent is unhealthy, policy enforcement is inconsistent or absent.
- Document cross-cluster behavior if you use Cilium Cluster Mesh. Cilium may restrict label-based selectors to the local cluster by default; remote cluster traffic may require explicit cluster label selectors.
How Netdata helps
Netdata surfaces the silent nature of these failures by correlating signals that application logs miss:
- Correlate sudden drops in inter-pod network throughput with CNI plugin CPU, memory, or restart events.
- Monitor DNS resolution latency at the node level to catch the DNS egress trap.
- Track kernel conntrack utilization and drop rates when policy changes increase connection churn.
- Map per-node network anomalies alongside Kubernetes workload events to identify the policy rollout that coincided with the first timeouts.






