Asymmetric routing: why your path and latency measurements lie
Your monitoring says the path is fine. Ping latency is normal, traceroute shows a clean route, and interface counters look healthy. But applications are slow, TCP sessions stall or reset, and users are complaining. Your tools are measuring only half the path.
In asymmetric routing, traffic from host A to host B takes one path (P1) while return traffic from B to A takes a different path (P2). When P2 is degraded, congested, or broken, your measurements average the healthy forward path with the impaired return path. Every acknowledgment and response is fighting through a bad route while the aggregate looks acceptable.
Standard traceroute makes this worse. It reports round-trip time (forward plus return combined) and traces only the forward path hop by hop. A latency spike on the return leg is invisible. Classic traceroute also produces phantom routes under equal-cost multipath (ECMP): each probe varies the source port, so ECMP routers hash different probes onto different physical paths. The resulting hop list is synthetic. No single packet traversed that route.
What this means
Asymmetric routing is not itself a failure. Many production networks route asymmetrically by design: BGP policy differences across peers, per-flow load balancing across unequal links, policy-based routing for traffic engineering, and cloud provider peering arrangements all create paths where forward and return traffic diverge. The problem arises when one direction degrades and your monitoring cannot see it.
The failure pattern is specific: forward-path probe latency reads healthy, reverse-path probe latency is elevated or shows loss, ICMP RTT shows high variance, and application-layer RTT from flow data shows high p99 with low p50. Traceroute from A to B and from B to A tells different stories. BGP route changes often appear around the same time.
flowchart LR
A["Host A"] -->|"P1: forward path - healthy"| B["Host B"]
B -.->|"P2: return path - degraded or lossy"| AMeasurements must be taken in both directions independently. A single-direction probe, or a round-trip measurement that conflates both directions, will hide the problem.
Common causes
| Cause | What it looks like | First thing to check |
|---|---|---|
| BGP policy asymmetry | Routes advertised differently in each direction; one peer preferred outbound, another inbound | show ip bgp summary on both ends; compare AS-path and next-hop |
| Per-flow load balancing (ECMP) with unequal paths | Intermittent loss or latency variance; some flows affected, others fine | Paris traceroute with constant 5-tuple to identify ECMP hashing |
| Route redistribution asymmetry | Different routing protocols redistributing routes differently on each device | Compare route tables on both endpoints: show ip route <prefix> |
| Policy-based routing (PBR) | Forward traffic matches one PBR rule, return matches another | Check PBR policy maps on both devices |
| NAT in one direction | Return traffic sourced from a different IP; stateful firewalls may drop unsolicited return packets | Check NAT translation logs; compare pre-NAT and post-NAT addresses |
| Linux rp_filter in strict mode | Return packets silently dropped by kernel because they arrive on an unexpected interface | sysctl net.ipv4.conf.all.rp_filter (1 = strict, 2 = loose) |
| Stateful firewall on asymmetric path | TCP SYN crosses firewall A; SYN-ACK returns via a different path; firewall never sees completing ACK; session times out | Check firewall session tables for half-open connections |
Quick checks
Run these from both endpoints where possible. All are read-only and non-disruptive.
# Traceroute in both directions - compare hop counts, paths, and latency
traceroute -n <target>
# Then run from the target back to your source
# mtr for sustained path monitoring (shows per-hop loss)
mtr -n -c 100 <target>
# Paris traceroute: holds the 5-tuple constant to defeat ECMP hash variation
paris-traceroute -n <target>
# Check Linux reverse-path filter setting (1=strict, 2=loose, 0=disabled)
sysctl net.ipv4.conf.all.rp_filter
sysctl net.ipv4.conf.default.rp_filter
# Also check per-interface overrides
sysctl net.ipv4.conf.eth0.rp_filter
# Show which route the kernel uses for a specific destination
ip route get <target_ip>
# Check per-direction interface utilization via SNMP
snmpwalk -v2c -c <community> <device> .1.3.6.1.2.1.31.1.1.1.6 # ifHCInOctets
snmpwalk -v2c -c <community> <device> .1.3.6.1.2.1.31.1.1.1.10 # ifHCOutOctets
# Compare BGP route tables from both perspectives
ssh <router> 'show ip bgp summary'
ssh <router> 'show ip route <prefix>'
How to diagnose it
Run traceroute in both directions. This is the single most important step. If the hop lists differ, routing is asymmetric. Compare not just the hop count but the intermediate routers and the latency at each hop. A 5ms forward path paired with an 80ms return path is the signature of asymmetric degradation.
Use Paris traceroute to rule out ECMP artifacts. Classic traceroute varies the source port per probe, causing ECMP routers to hash each probe onto a potentially different physical path. The stitched hop list is synthetic. Paris traceroute holds the full 5-tuple (source IP, destination IP, source port, destination port, protocol) constant so every probe hashes to the same ECMP path.
Check for high RTT variance. High p99 with low p50 RTT on ICMP or application-layer probes means some packets are taking a longer path. This is the signature of partial asymmetry where some flows follow one route and others follow another. A jitter value greater than 0.3 times the mean RTT is a queueing or path-divergence indicator.
Compare forward and reverse flow data. If you collect flow data (NetFlow, IPFIX, sFlow) at both endpoints, compare byte and packet counts for the same 5-tuple in both directions. Asymmetric routing is normal in many networks. What matters is consistency over time. A sudden step change in the forward/reverse ratio indicates a routing change or a failed link.
Check rp_filter on Linux hosts. Strict mode (value 1) drops packets whose source address is reachable only via a different interface than the one the packet arrived on. In asymmetric routing, legitimate return packets may arrive on the “wrong” interface from the kernel’s perspective. Loose mode (value 2) drops packets only when no route to the source exists at all. Mode 2 is required when asymmetric routing is present. The effective per-interface value is
max(all, interface_setting), so settingall=2overrides any individual interface still set to 1.Examine BGP route tables from both ends. Compare what each router sees as the best path to the other. Policy differences, AS-path prepending, or community-tag-based local preference can cause each side to prefer different upstreams. Look for BGP route changes around the time the symptoms started.
Check stateful firewall session tables. If a stateful firewall sits on one direction of the path, it expects to see the full TCP handshake (SYN, SYN-ACK, ACK). When the SYN-ACK returns via a different path, the firewall never sees the completing ACK. After the embryonic connection timeout (typically 30-60 seconds depending on vendor), it purges the half-open session. Subsequent packets from the client are silently dropped. Look for half-open connections or a high rate of session table purges.
Verify interface utilization per direction. Asymmetric saturation (one direction at capacity, the other idle) is a symptom, not a cause. If you see 95% utilization inbound on one interface and 5% outbound on the same interface, while the return path uses a different interface, the problem is capacity on the saturated direction.
Metrics and signals to monitor
| Signal | Why it matters | Warning sign |
|---|---|---|
| Forward-path probe latency (IPSLA, TWAMP, HTTP) | Measures one direction independently | Healthy while applications are failing |
| Reverse-path probe latency | Measures the return direction independently | Elevated or showing loss while forward path is clean |
| ICMP RTT variance (jitter) | High variance indicates some packets taking different paths | p99/p50 ratio greater than 3 |
| Application-layer RTT from flows | Real user experience, including retransmits | High p99 with low p50 |
| Per-direction interface utilization | Reveals asymmetric saturation | One direction near 100%, other idle |
| Forward/reverse flow byte ratio | Detects routing changes | Sudden step change from baseline |
| BGP prefix count per peer | Route changes cause path shifts | Sudden increase or decrease around symptom onset |
| Linux rp_filter setting | Strict mode silently drops asymmetric return packets | Value = 1 when asymmetric routing is expected |
| Stateful firewall session purge rate | Half-open sessions indicate asymmetric handshakes | High purge rate for incomplete connections |
Fixes
Fix Linux rp_filter blocking return traffic
If the return path arrives on a different interface than the kernel expects, strict reverse-path filtering (mode 1) will silently drop those packets. Set loose mode (2), which validates only that a route to the source exists, not that it matches the arriving interface.
# Set to loose mode (takes effect immediately)
sysctl -w net.ipv4.conf.all.rp_filter=2
sysctl -w net.ipv4.conf.default.rp_filter=2
# Persist across reboots
echo "net.ipv4.conf.all.rp_filter=2" >> /etc/sysctl.d/99-asymmetric-routing.conf
echo "net.ipv4.conf.default.rp_filter=2" >> /etc/sysctl.d/99-asymmetric-routing.conf
sysctl --system
Warning: switching from strict to loose mode reduces protection against spoofed source addresses. Apply only where asymmetric routing is known to occur, and ensure perimeter filtering handles spoofing at the network edge.
Some modern distributions default to loose mode (2), but enterprise distributions and hardened baselines often set strict mode (1). If you are running a hardened baseline, this is a common source of mysterious packet loss.
Fix BGP policy asymmetry
If the asymmetry is unintentional, the fix is routing policy correction: align BGP local preference, AS-path prepending, or MED values so that both endpoints prefer the same path for both directions. If the asymmetry is intentional (traffic engineering, cost optimization), the fix is monitoring, not routing. Ensure your probes measure each direction independently.
Fix stateful firewall drops
Stateful firewalls require seeing both directions of a TCP connection. When the handshake is split across paths, the firewall sees only the SYN and never the SYN-ACK. Vendor-specific mechanisms exist to handle this. pfSense offers a “sloppy” state type that does not enforce handshake sequencing.
Alternatively, restructure routing so that both directions of a flow traverse the same firewall. In cloud environments, this is particularly important: AWS Network Firewall expects symmetric flow state, and Azure ExpressRoute takes priority over coexisting Site-to-Site VPN connections, which can silently black-hole return traffic.
Fix monitoring blind spots
The most important fix is often not a routing change but a monitoring change. If your probes measure only round-trip latency, they will always hide reverse-path degradation. Deploy active probes in both directions: IPSLA from A to B and from B to A, or TWAMP sessions that measure one-way delay. Track forward and reverse flow data separately so you can detect when the ratio changes.
Prevention
- Measure both directions independently. Round-trip probes are necessary but not sufficient. Forward and reverse probes together reveal asymmetric degradation that aggregate measurements hide.
- Baseline the forward/reverse flow ratio. Asymmetric routing is normal in many networks. A sudden step change in the ratio is the event that indicates a routing problem.
- Verify rp_filter settings after provisioning new hosts. Strict mode is the default in some hardened baselines and will silently break asymmetric routing. Check it as part of your host bring-up checklist.
- Document intentional asymmetry. If traffic engineering intentionally routes return traffic differently, ensure the operations team knows and monitoring measures both paths. Undocumented intentional asymmetry looks identical to a routing failure during an incident.
- Use Paris traceroute in runbooks. Classic traceroute produces misleading results under ECMP. Standardize on Paris traceroute (or equivalent constant-5-tuple probing) for path diagnosis.
How Netdata helps
- Per-direction interface utilization collected via the SNMP plugin lets you see asymmetric saturation (one direction at 100%, the other idle) without manual walks.
- ICMP RTT probes with per-sample granularity reveal high variance that aggregate metrics miss. Correlate RTT p99 spikes with BGP route changes to confirm path divergence.
- Flow data collection from NetFlow, IPFIX, and sFlow can be correlated across collection points to detect sudden changes in the forward/reverse byte ratio.
- BGP monitoring tracks prefix counts, session state, and route changes, so you can correlate routing events with path degradation timestamps.
- Custom alerting on RTT p99/p50 ratio detects partial asymmetry where some flows take a different path, surfacing the problem before users complain.
Related guides
- BGP route leak and hijack: the detection signals and alerts that matter
- BGP session Established but stale: detecting silent route loss
- Network monitoring checklist: the signals every production network needs
- SNMP poll response latency: diagnosing a slow poller
- Flow export-to-ingest latency: why your NetFlow data is minutes behind







