Interface saturation: measuring utilization against ifHighSpeed correctly
Interface utilization is one of the most frequently miscomputed network metrics. The formula is simple: divide throughput by capacity, multiply by 100. But the SNMP objects for throughput and capacity have different precision, units, and failure modes. An interface that is genuinely saturated can report 0% utilization. A healthy link can report 300%.
The root cause is almost always the denominator. ifSpeed (IF-MIB .1.3.6.1.2.1.2.2.1.5) is a 32-bit Gauge that caps at 4,294,967,295 bps, approximately 4.29 Gbps. For any link faster than that, ifSpeed saturates at its maximum value and utilization computed against it is wrong. The correct denominator is ifHighSpeed (.1.3.6.1.2.1.31.1.1.1.15), which reports speed in units of 1,000,000 bps (Mbps) and has no practical upper bound. A value of 10000 means 10 Gbps; 100000 means 100 Gbps.
Thresholds and counter requirements
Sustained utilization above 80% on a critical interface is a capacity concern. Above 95% it typically means congestion, with drops and latency following.
The metric depends on two SNMP inputs: an octet counter for throughput and a speed value for capacity. Both have 32-bit and 64-bit variants, and selecting the wrong variant produces silently incorrect results. The 32-bit octet counter ifInOctets wraps every 3.4 seconds on a 10G link at line rate, every 34 seconds on a 1G link. The 64-bit counter ifHCInOctets effectively never wraps. Using 32-bit counters with modern link speeds produces fake spikes, negative deltas, or zero utilization on a saturated link.
Both ifSpeed and ifHighSpeed remain STATUS current in RFC 2863. Neither has been deprecated. But ifSpeed is functionally useless for any link at or above 4.29 Gbps. If your network has 10G, 25G, 40G, or 100G links, ifHighSpeed is not optional.
The correct formula and counter selection
Compute utilization as:
utilization_pct = (delta_octets * 8) / (delta_seconds * ifHighSpeed * 1000000) * 100
Where:
delta_octetsis the difference between two consecutive polls ofifHCInOctets(ingress) orifHCOutOctets(egress)delta_secondsis the elapsed time between the two pollsifHighSpeedis the interface speed in Mbps (units of 1,000,000 bps)- The factor of 8 converts octets to bits
In PromQL, the equivalent expression is:
rate(ifHCInOctets[5m]) * 8 / ifHighSpeed / 1000000 * 100
The OIDs you need:
| Purpose | OID | Type | Notes |
|---|---|---|---|
| Ingress octets (64-bit) | .1.3.6.1.2.1.31.1.1.1.6 | Counter64 | ifHCInOctets. Use for all links >= 100 Mbps |
| Egress octets (64-bit) | .1.3.6.1.2.1.31.1.1.1.10 | Counter64 | ifHCOutOctets |
| Interface speed | .1.3.6.1.2.1.31.1.1.1.15 | Gauge32 | ifHighSpeed, in Mbps |
| Counter discontinuity time | .1.3.6.1.2.1.31.1.1.1.3 | TimeTicks | Non-zero means counter was reset |
To collect these manually (read-only, safe for production):
# Poll 64-bit octet counters and speed for interface utilization
snmpwalk -v2c -c <community> <device> .1.3.6.1.2.1.31.1.1.1.6 # ifHCInOctets
snmpwalk -v2c -c <community> <device> .1.3.6.1.2.1.31.1.1.1.10 # ifHCOutOctets
snmpwalk -v2c -c <community> <device> .1.3.6.1.2.1.31.1.1.1.15 # ifHighSpeed (Mbps)
snmpwalk -v2c -c <community> <device> .1.3.6.1.2.1.31.1.1.1.3 # ifCounterDiscontinuityTime
flowchart TD
A["Interface to measure"] --> B{"ifHighSpeed > 0?"}
B -- No --> X["Denominator invalid
check vendor bug or
disconnected port"]
B -- Yes --> C{"Speed >= 1 Gbps?"}
C -- Yes --> D["ifSpeed saturates at 4.29 Gbps
ifHighSpeed is mandatory"]
C -- No --> E["ifSpeed usable but
ifHighSpeed still preferred"]
D --> F["Poll ifHCInOctets / ifHCOutOctets
Counter64, not 32-bit ifInOctets"]
E --> F
F --> G["Compute delta between polls
Check ifCounterDiscontinuityTime"]
G --> H["Apply utilization formula
from section above"]
H --> I{"Result 0 to 100 pct?"}
I -- No --> J["Investigate:
counter wrap, wrong OID,
or vendor bug"]
I -- Yes --> K["Valid utilization reading"]Some monitoring platforms poll ifSpeed first and fall back to ifHighSpeed only when it returns 0 or 4294967295. This works but adds a needless failure surface. Defaulting to ifHighSpeed directly is simpler and avoids the edge case for any deployment with links at or above 1 Gbps.
Where it breaks: vendor bugs and unit confusion
Even with the correct formula and OIDs, vendor implementations can produce wrong numbers.
ifHighSpeed returns 0 on disconnected or idle ports. Some vendors return 0 for ifHighSpeed on interfaces that are administratively up but operationally down, or on disconnected ports. This produces a division-by-zero in the utilization formula, which most platforms handle as either 0% utilization or a suppressed calculation. The interface looks idle when it may be non-functional.
ifHighSpeed returns implausible values on disconnected ports. Some switch OSes have been observed returning values in the petabits-per-second range for ifHighSpeed on disconnected ports. Any utilization alert that fires with a physically impossible denominator should be checked against ifOperStatus first.
Wrong unit convention. Some SNMP agents reportedly return ifHighSpeed in bits per second rather than the RFC-mandated units of 1,000,000 bps. In these cases, ifHighSpeed matches ifSpeed numerically (for example, both return 1000000000 for a 1G link). The utilization formula then divides by a denominator that is 1,000,000 times too large, producing a near-zero reading on a fully saturated link. Detect this by checking whether ifHighSpeed and ifSpeed match numerically. Under correct implementation, they should differ by a factor of 1,000,000.
Fixed-speed reporting bug. Some switch models report a fixed 10 Mbps for both ifSpeed and ifHighSpeed regardless of the actual link speed. This causes utilization to exceed 100% on any faster link, triggering false saturation alerts. This is a firmware bug, not a protocol issue.
Cached stale speed value. If the monitoring platform polls the speed denominator infrequently and caches it, any of the above bugs persist until the cache expires. A speed value cached once per day amplifies a transient reporting error into a sustained wrong reading.
Common misuses
Using 32-bit ifInOctets on high-speed links. ifInOctets wraps every 3.4 seconds at 10G line rate, every 34 seconds at 1G, every 5.7 minutes at 100M. Any poll interval longer than the wrap time produces a counter that has rolled over between polls. Naive differencing then computes either a massive fake spike (when the wrapped delta is treated as unsigned) or a fake zero (when the wrapped delta is negative and discarded). The 64-bit HC counters are mandatory for any link at or above 100 Mbps.
Using ifSpeed as the denominator for links above 4.29 Gbps. ifSpeed saturates at 4,294,967,295 bps. A 10G link reports ifSpeed = 4294967295, which is only 43% of the real 10 Gbps capacity. Utilization computed against this denominator is overestimated by a factor of approximately 2.3: a link at 43% real utilization reports 100%, and a link at 80% reports approximately 186%.
Ignoring counter discontinuities. When a device reboots or an interface resets, counters jump to zero. ifCounterDiscontinuityTime records when the last reset occurred. If this value changes between polls, the delta for that interval is invalid and should be discarded, not reported as a utilization value.
Treating sub-interface utilization as independent. Sub-interfaces (VLAN SVIs, tunnel interfaces) share the bandwidth of their parent physical interface. A sub-interface can show 100% utilization while the physical interface is at 30%, or vice versa. Aggregate sub-interface counters to the physical level before computing utilization for capacity planning.
Relying on 5-minute polling to detect saturation. Microbursts at line rate can fill output queues and cause discards in sub-second windows that are invisible at 5-minute polling granularity. A link that averages 40% over 5 minutes may have hit 100% for 200ms multiple times during that window, causing ifOutDiscards to increment. If you see discard counters rising on an interface that reports moderate utilization, suspect microbursts. Use 1-minute polling or flow-based analysis for critical interfaces.
Signals to watch in production
| Signal | Why it matters | Warning sign |
|---|---|---|
ifHighSpeed value | Correct denominator. Wrong value means every utilization reading is wrong | Value of 0, value matching ifSpeed exactly, or physically impossible value |
ifHCInOctets / ifHCOutOctets delta | Actual throughput numerator. Must use 64-bit counters | Sudden terabit-scale spike (counter wrap) or zero on a known-active link |
ifCounterDiscontinuityTime | Detects counter resets that invalidate the delta | Non-zero value or value changing between polls |
ifOperStatus | Interface must be up for utilization to be meaningful | Down status with non-zero utilization (stale data) |
ifInDiscards / ifOutDiscards | Leading indicator of congestion. Discards can begin before utilization reaches 100% due to microbursts | Rising discard rate on an interface reporting under 80% average utilization |
ifInErrors / ifOutErrors | Physical-layer degradation that compounds with saturation | Rising error rate correlated with high utilization |
| Poll interval vs. counter wrap time | If poll interval exceeds wrap time, deltas are unreliable | 32-bit counter on 10G link with 5-minute polling (wrap time ~3.4 seconds) |
How Netdata helps
- Interface utilization charts use 64-bit HC counters (
ifHCInOctets,ifHCOutOctets) againstifHighSpeedas the denominator, avoiding both the 32-bit wrap and theifSpeedcap on high-speed links. - Utilization is charted per-direction (ingress and egress separately), making asymmetric saturation immediately visible.
- Utilization, discards, and error counters appear on the same interface dashboard, letting you distinguish sustained congestion (high utilization plus rising discards) from microburst drops (moderate average utilization with rising
ifOutDiscards). - Counter discontinuity detection suppresses invalid deltas after device reboots or interface resets, preventing fake spikes from corrupting charts.
- Per-interface alarm templates support configurable severity tiers by interface role, so a backup link at 95% during a scheduled window does not page the same as a production uplink at 95% during business hours.
Related guides
- Asymmetric routing: why your path and latency measurements lie
- Flow export-to-ingest latency: why your NetFlow data is minutes behind
- Cold-start topology: why your map is incomplete after a collector restart
- BGP session Established but stale: detecting silent route loss
- NetFlow storage sizing: how much disk your flow collector really needs







