$ guides / network / network-fdb-mac-staleness ▌

Operations Guides

Stale FDB/MAC tables: why endpoint location is wrong

Your topology platform says endpoint aa:bb:cc:dd:ee:ff is on switch port Gi1/0/24. Your security team sends someone to that port. The endpoint is not there. It moved hours ago, or it went offline, or it vMotioned to a different host. The FDB entry was stale and the platform presented it as current.

The Forwarding Database (FDB), also called the MAC address table or CAM table, maps MAC addresses to switch ports. Topology inference engines use FDB data, cross-referenced with ARP tables and CDP/LLDP neighbor data, to deduce where endpoints are physically connected. The inference is probabilistic. It degrades as input data freshness degrades.

The core problem: standard MIBs do not expose how long ago an entry was last refreshed. Staleness must be inferred from polling deltas, and most teams do not compute it. The result: security investigations go to the wrong switch port, the wrong building, the wrong rack. Positioning queries return wrong answers with high confidence.

What this means

An FDB entry is learned when the switch receives a frame with a source MAC on a port. It ages out after a configurable timer with no traffic from that MAC on that port. If the MAC reappears on a different port, a new entry is learned and the old entry is overwritten or removed.

The failure occurs in the gap between “entry aged out” and “topology engine noticed.” During that gap, the topology engine still has the old entry (or no entry) in its cache. If the endpoint moved, the engine does not know the new location. If the endpoint went offline, the engine thinks it is still connected.

The ARP table adds a second layer of staleness. ARP maps IP addresses to MAC addresses, and ARP timeouts are typically much longer than MAC aging timeouts. This asymmetry is the core problem. On Arista EOS, MAC aging is 5 minutes (300 seconds) but ARP timeout is 4 hours (14400 seconds). On Cisco IOS, ARP timeout is commonly 4 hours. On Linux, the neighbor cache depends on several sysctl parameters (gc_stale_time, gc_thresh1/2/3, base_reachable_time); entries can persist well beyond the MAC aging timer in practice. A quiet host can lose its MAC-to-port mapping while its IP-to-MAC mapping persists for hours. The switch knows the IP belongs to a MAC but has forgotten which port that MAC is on. Unicast frames flood until the entry is relearned.

Platform-specific MAC aging defaults also vary. Cisco Catalyst and IOS typically default to 300 seconds. Cisco Nexus 7000 series defaults to 1800 seconds (30 minutes), meaning stale entries linger six times longer than on Catalyst. Linux bridge default aging time is 300 seconds, configurable via ip link set dev <br> type bridge ageing_time <seconds>. Setting this value to 0 disables aging entirely (entries never expire), which is a common misconfiguration that causes permanent staleness. Open vSwitch uses ovs-vsctl set bridge <br> other_config:mac-aging-time=<seconds>.

flowchart TD
    A["Endpoint goes quiet
or moves to new port"] --> B["MAC aging timer expires"]
    B --> C["FDB entry removed
from old port"]
    C --> D["ARP entry persists
timeout much longer"]
    D --> E["Topology engine
next poll cycle"]
    E --> F["FDB: no entry or
stale entry for MAC"]
    F --> G["ARP: IP to MAC
still mapped"]
    G --> H["Positioning returns
wrong or missing port"]
    H --> I["Unicast floods until
MAC is relearned"]

Common causes

Cause	What it looks like	First thing to check
Aging timer asymmetry (ARP vs MAC)	Switch has ARP entry but no FDB entry for the MAC; unicast floods	Compare ARP timeout vs MAC aging timer on the device
Polling cadence longer than aging timer	Topology engine is always one cycle behind; entries expire between polls	Compare poll interval against MAC aging timer
Endpoint mobility (vMotion, failover)	MAC appears on new port but old entry persists in topology cache	Check FDB for the MAC across multiple switches in the VLAN
Stale entries not aged aggressively	Offline hosts remain in FDB for hours; positioning still reports high confidence	Check aging timer configuration; compute time since last refresh
FDB table full or eviction	Intermittent connectivity for previously-stable hosts; oldest entries evicted	Check FDB size against platform capacity
Aging disabled (value 0 on Linux bridge)	Entries never expire; offline hosts persist permanently	Check ageing_time setting; 0 means disabled
OVS static FDB aging bug (pre-Oct 2022)	Dynamic entries never age out when static entries coexist	Check OVS version against fix commit ccc24fc88d59

Quick checks

These are read-only commands safe to run during an incident.

# Check FDB entry count on a switch (Q-BRIDGE-MIB, VLAN-aware)
snmpwalk -v2c -c <community> <switch> .1.3.6.1.2.1.17.7.1.2.2.1.1 | wc -l

# Show MAC address-table count via CLI
ssh <switch> 'show mac address-table count'

# Check ARP cache summary
ssh <device> 'show ip arp summary'

# Look up a specific MAC in the FDB (BRIDGE-MIB)
# Note: BRIDGE-MIB stores MAC as dotted-decimal octets in the OID index,
# e.g. aa:bb:cc:dd:ee:ff appears as 170.187.204.221.238.255 in the walk output.
snmpwalk -v2c -c <community> <switch> .1.3.6.1.2.1.17.4.3.1 | grep <dotted-decimal-mac>

# Check STP topology change count (spike triggers accelerated MAC aging)
snmpget -v2c -c <community> <switch> .1.3.6.1.2.1.17.2.4.0

# Check aging timer on a Linux bridge
ip -d link show br0 | grep ageing_time

# Check aging timer on OVS
ovs-vsctl get bridge br0 other_config:mac-aging-time

# Cross-reference ARP vs FDB for a specific endpoint
ssh <switch> 'show ip arp <IP>'
ssh <switch> 'show mac address-table address <MAC>'

# Check ARP cache via SNMP (IPv4 and IPv6 unified table)
snmpwalk -v2c -c <community> <device> .1.3.6.1.2.1.4.35.1

# Check OVS FDB eviction and movement counters (OVS 2.10+)
ovs-appctl fdb/stats-show <bridge>

How to diagnose it

Identify the endpoint. Get the MAC address and last-known IP. If you only have an IP, look it up in ARP first.
Check the FDB for the MAC on the expected switch. Use show mac address-table address <MAC> or SNMP walk of the Q-BRIDGE-MIB dot1qTpFdbAddress at .1.3.6.1.2.1.17.7.1.2.2.1.1. If the entry is absent, the MAC has aged out or moved. If present, note the port and VLAN.
Check the ARP table for the IP. Use show ip arp <IP> or SNMP walk of ipNetToPhysicalEntry at .1.3.6.1.2.1.4.35.1. If ARP has the entry but FDB does not, you have aging timer asymmetry. The switch knows the IP-to-MAC mapping but has lost the MAC-to-port mapping.
Check neighboring switches. If the endpoint moved, the FDB entry may exist on a different switch in the same L2 domain. Walk the FDB on access switches connected to the same VLAN.
Check the aging timer configuration. Compare the MAC aging timer against your topology engine’s polling cadence. If aging (300s default) is shorter than or close to your poll interval, the topology engine is systematically behind reality.
Check STP topology changes. A spike in dot1dStpTopChanges at .1.3.6.1.2.1.17.2.4 indicates reconvergence. Under legacy STP (802.1D), a topology change causes the MAC aging timer to be shortened to the forwarding delay (default 15 seconds) for the duration of the TCN, approximately max_age + forward_delay (35 seconds with defaults). Under RSTP (802.1w), MAC table entries are flushed on ports receiving a topology change rather than shortening the global aging timer.
Check FDB capacity. If the FDB is near capacity, entries are evicted to make room for new ones. On OVS, ovs-appctl fdb/stats-show <bridge> exposes learning statistics. The eviction algorithm targets the oldest entry from the port with the highest number of FDB entries.
Check topology inference confidence. If your topology engine exposes a per-endpoint confidence score, check whether it dropped for the endpoint in question. Low confidence means the engine itself is uncertain, but many platforms present low-confidence results with high-confidence UI treatment.
Check endpoint positioning orphan rate. If your topology engine tracks MAC addresses it cannot resolve to a physical switch port, a rising orphan rate indicates systematic positioning failure.

Metrics and signals to monitor

Signal	Why it matters	Warning sign
FDB entry count	Approaching capacity causes flooding and eviction	Sustained growth; above 80% of platform limit
FDB entry freshness (computed from polling deltas)	Stale entries mislead endpoint positioning	No refresh in 3 to 4 times the aging interval
ARP cache entry count and staleness	Long-lived ARP entries outlive MAC entries, hiding stale mappings	ARP entries persisting beyond MAC aging timeout
STP topology change count	Reconvergence flushes and rebuilds FDB	Sudden spike in dot1dStpTopChanges
Topology view consistency (CDP/LLDP vs FDB vs ARP)	Disagreement reveals stale data or topology change in progress	Persistent inconsistency across sources
Topology inference confidence score	Low confidence means positioning is unreliable	Sustained drop, especially after mobility events
Endpoint positioning orphan rate	MACs unresolved to physical ports indicate positioning failure	Above 5% of discovered endpoints sustained
Per-port MAC count	Abnormal concentration on one port indicates daisy-chain or attack	Single access port with unexpectedly high count

Fixes

Align aging timers across ARP and MAC

The most common root cause is asymmetric aging. If your platform allows tuning both timers, narrow the gap. On Cisco IOS, set MAC aging with mac address-table aging-time <seconds>. On Linux bridges, use ip link set dev <br> type bridge ageing_time <seconds>. On OVS, use ovs-vsctl set bridge <br> other_config:mac-aging-time=<seconds>.

Tradeoff: shorter aging timers cause more unknown-unicast flooding and relearning overhead. Longer timers cause more staleness. Virtualization environments with frequent vMotion benefit from shorter timers. Stable access-layer environments can tolerate longer ones.

Increase polling cadence for FDB and ARP

If your topology engine polls FDB every 5 minutes but MAC aging is 300 seconds, the engine is always one cycle behind. Either increase polling frequency or reduce the aging timer so entries survive across poll cycles.

Tradeoff: walking a large FDB table on a data center switch with 50,000 MAC entries can take seconds and spike device control-plane CPU. Target large-table walks to specific VLANs or use streaming telemetry where available.

Trigger gratuitous ARP after failover

After a failover event (HSRP, VRRP, F5 LTM HA), switches retain stale MAC and ARP entries pointing to the old active node. Explicitly clearing the MAC address table or triggering gratuitous ARPs accelerates convergence.

WARNING: clear mac address-table dynamic flushes all dynamically learned MAC entries on the switch. This causes temporary unicast flooding on all VLANs until entries are relearned. Run it only during a planned maintenance window or active failover. The same applies to clearing ARP (clear ip arp) on Cisco platforms.

Tradeoff: clearing the MAC table disrupts traffic briefly while entries are relearned. Use only during planned failover windows.

Patch or upgrade OVS if static entries block aging

Prior to a fix merged in late 2022 (commit ccc24fc88d59), OVS had a bug where the presence of static FDB entries prevented learned dynamic entries from aging out normally. If you are running an older OVS version and mixing static and dynamic FDB entries, upgrade.

Enable port-security to limit per-port MAC count

Port-security limits the number of MAC addresses learned per port, preventing stale or rogue entries from accumulating. Its absence is itself a finding.

Tradeoff: too-low limits disrupt legitimate virtualization hosts with many VM MACs. Tune per port class.

Prevention

Compute FDB/ARP freshness from polling deltas. Standard MIBs do not expose “time since last refresh.” You must derive it by tracking when entries appear and disappear across poll cycles. Flag entries with no refresh in 3 to 4 times the aging interval as stale.
Track topology inference confidence per endpoint. Do not rely on aggregate confidence. Per-endpoint confidence tells you which specific endpoints have unreliable positioning. Endpoints behind wireless APs and VMs with MAC mobility often have persistently low confidence.
Cross-validate topology sources. Compare CDP/LLDP neighbor data against FDB and ARP. When three sources agree, confidence is high. When one source disagrees, investigate. Persistent inconsistency across sources for more than 24 hours indicates a stale-data problem.
Monitor endpoint positioning orphan rate. Track the percentage of MAC addresses the topology engine cannot resolve to a physical switch port. A rising orphan rate after network changes indicates systematic positioning failure.
Align aging timers across the L2 domain. Ensure MAC aging and ARP timeout are configured consistently across switches in the same broadcast domain. Mismatched timers between adjacent switches cause asymmetric staleness.
Verify aging is not disabled. On Linux bridges, ageing_time of 0 disables aging entirely. Entries never expire. Check this on every bridge in production.

How Netdata helps

Correlate FDB entry count with STP topology changes. Collect dot1dStpTopChanges and FDB entry count via SNMP in the same dashboard. A spike in topology changes followed by FDB churn is the signature of forced MAC relearning.
Alert on FDB capacity thresholds. Set alarms on FDB entry count approaching platform limits to catch table exhaustion before it causes eviction and flooding.
Track ARP cache size alongside FDB size. Divergence between ARP entry count and FDB entry count is the core symptom of aging timer asymmetry. Collect both via SNMP and alert when the gap grows.

Monitor SNMP metrics with Netdata

The Netdata solution

Network monitoring with Netdata

Netdata monitors network infrastructure with per-second interface metrics, SNMP, NetFlow/sFlow/IPFIX, and ML anomaly detection. Correlate interface flapping, packet drops, routing changes, and traffic spikes with the systems that depend on them.

See network monitoring → Start monitoring free

Stale FDB/MAC tables: why endpoint location is wrong

Stale FDB/MAC tables: why endpoint location is wrong

What this means

Common causes

Quick checks

How to diagnose it

Metrics and signals to monitor

Fixes

Align aging timers across ARP and MAC

Increase polling cadence for FDB and ARP

Trigger gratuitous ARP after failover

Patch or upgrade OVS if static entries block aging

Enable port-security to limit per-port MAC count

Prevention

How Netdata helps

Related

Network monitoring with Netdata