SNMP trap receiver dropping traps: silent UDP/162 loss

When SNMP traps silently disappear, the first place to look is rarely the device. SNMP traps are push-based UDP datagrams on port 162. The kernel buffers them, and the receiver application (typically snmptrapd or a commercial collector) must drain that buffer faster than it fills. If it does not, the kernel silently drops datagrams and increments a counter the application never sees. No error is logged, and no alert fires.

During an incident, failing devices emit more traps, not fewer. A link-flap cascade or STP reconvergence can produce thousands of traps per second. The socket buffer overflows precisely when high-value traps arrive in the same burst as noise. The application layer has no visibility into these drops; snmptrapd logs only what it decodes. Your dashboard shows what it received, not what the device sent. The only authoritative signal lives in the kernel’s Udp_RcvbufErrors counter.

Where traps are lost

A trap datagram can die silently at three points between the wire and your log file. Only one of those points is visible at the interface level. The other two are invisible unless you explicitly monitor for them.

flowchart TD
    A["Device sends trap UDP/162"] --> B["NIC RX ring buffer"]
    B --> C{"Ring overflow?"}
    C -- yes --> D["NIC drop: /proc/net/dev"]
    C -- no --> E["Kernel UDP socket buffer"]
    E --> F{"Buffer full?"}
    F -- yes --> G["Kernel drop: Udp_RcvbufErrors"]
    F -- no --> H["snmptrapd reads PDU"]
    H --> I{"ACL / authCommunity pass?"}
    I -- no --> J["Silent discard: no log"]
    I -- yes --> K["Trap logged and handled"]
    G --> L["Invisible to application"]
    J --> L
    D --> L

NIC ring buffer overflows happen before the kernel sees the packet. Socket buffer overflows happen after the kernel receives the packet but before snmptrapd reads it. Access control rejection (net-snmp 5.3+) happens after snmptrapd decodes the PDU but before it logs anything. All three produce identical symptoms: a trap the device swears it sent never appears in your system.

Common causes

CauseWhat it looks likeFirst thing to check
Kernel socket buffer overflowTraps vanish during bursts; gaps in the log; device confirms it sent themnstat -az Udp_RcvbufErrors
Slow traphandle script blocking the daemonsnmptrapd serializes trap delivery; a slow handler script causes backlog; buffer fills behind itRemove or time the handler script
Access control silently discarding (net-snmp 5.3+)Traps arrive at the socket but never appear in the log; zero traps from a specific device or after an upgradeCheck snmptrapd.conf for matching authCommunity or authUser rules
NIC RX ring buffer overflowUdp_RcvbufErrors is low but /proc/net/dev shows rising RX drops on the collector NIC`ethtool -S eth0
Firewall dropping UDP 162Zero traps from specific sources or all sources; kernel never receives the datagramiptables -L INPUT -v -n or nft list ruleset
RSS funneling all IRQs to one coreOne CPU core pinned at 100% during bursts while others idle; buffer overflows despite aggregate headroom`cat /proc/interrupts

Quick checks

These commands are safe and read-only. Run them on the trap receiver host.

# Confirm snmptrapd is listening on UDP 162
ss -lun '( sport = :162 )'

# Check kernel-side UDP receive buffer drops (the canonical silent-loss signal)
nstat -az Udp_RcvbufErrors

# Same counter from /proc/net/snmp (look at the RcvbufErrors column)
cat /proc/net/snmp | grep '^Udp:'

# Current socket queue depth and buffer limits for the listener
ss -lun '( sport = :162 )' -m

# Verify traps are physically arriving on the NIC (not a network problem)
tcpdump -i eth0 -nn 'udp port 162' -c 100

# Check NIC-level drops (happen before the socket layer)
cat /proc/net/dev

# Check ethtool drop counters for hardware-level resource exhaustion
ethtool -S eth0 | grep -i drop

# Per-core CPU during a burst (RSS diagnostic: one core at 100% is suspicious)
mpstat -P ALL 1 5

# Check current kernel buffer tunables
sysctl net.core.rmem_max net.core.rmem_default

# Inspect recent snmptrapd log entries for gaps or format issues
tail -100 /var/log/snmptrapd.log

How to diagnose it

  1. Check Udp_RcvbufErrors first. This is the single most important counter. If it is incrementing, traps are reaching the kernel but the socket buffer is full. The application is not draining fast enough. This is a collector-side problem, not a network problem.
  2. Confirm traps are arriving on the wire. Run tcpdump -i eth0 -nn 'udp port 162' -c 100. If you see packets, the network path is fine and the problem is local. If you see nothing, the issue is upstream: firewall, routing, or the device is not sending.
  3. Check the socket queue depth. Run ss -lun '( sport = :162 )' -m during a burst. If Recv-Q sits near the buffer ceiling, the application is the bottleneck. A healthy listener should drain the queue to near-zero between samples.
  4. Check for a slow traphandle script. snmptrapd delivers traps to handler programs serially. A handler that shells out to a database insert, an API call, or a complex MIB lookup will block the daemon. The socket buffer fills behind the blocked process. Temporarily remove the traphandle directive from snmptrapd.conf, restart, and see if drops stop. If they do, the handler is the bottleneck.
  5. Verify access control if running net-snmp 5.3 or later. Since version 5.3, snmptrapd enforces access control by default. Incoming traps without a matching authCommunity, authUser, or disableAuthorization yes directive in snmptrapd.conf are silently discarded after decoding. This produces identical symptoms to buffer overflow: the trap arrives but never appears in the log.
  6. Check NIC-level drops. If Udp_RcvbufErrors is flat but /proc/net/dev shows RX drops on the collector NIC, packets are being lost at the ring buffer level, before the kernel socket layer. The fix is different: increase the NIC ring buffer with ethtool -G, not the socket buffer.
  7. Check RSS distribution. Run cat /proc/interrupts | grep eth0 and mpstat -P ALL 1 5. If all packet receive interrupts land on one core while others are idle, RSS is misconfigured. That core saturates at 100% softirq, cannot drain the socket buffer fast enough, and Udp_RcvbufErrors climbs. The aggregate CPU looks fine, which is why this is one of the most under-diagnosed causes.
  8. Verify dual-stack binding if you expect IPv6 traps. snmptrapd binds IPv4 by default. Binding udp6:[::]:162 does not automatically cover IPv4 on all platforms. Use snmpTrapdAddr udp:162,udp6:162 in snmptrapd.conf to listen on both stacks.

Metrics and signals to monitor

SignalWhy it mattersWarning sign
Udp_RcvbufErrors (/proc/net/snmp)The authoritative counter for socket-buffer-level trap loss. Nonzero means datagrams arrived at the kernel but were dropped because the buffer was full.Any increment is abnormal. A sustained rate means active data loss.
NIC RX drops (/proc/net/dev)Drops at the ring buffer level happen before the socket layer. Different root cause, different fix.Rising RX drops on the collector’s trap-ingress NIC.
Socket Recv-Q depth (ss -lun -m)Shows whether the application is keeping up in real time.Recv-Q approaching buffer ceiling during bursts.
Trap receive rate per deviceEstablishes which devices are noisy. A single device dominating trap volume is a finding.Sudden spike from one device, or total silence from a normally chatty device.
Per-core CPU %soft (mpstat)Softirq time on the receive core indicates packet processing load. RSS misconfiguration funnels this to one core.One core at 100% %soft while others are idle.
idgmerr/s (sar -n UDP 1)Input datagram errors per second. Corroborates Udp_RcvbufErrors with a rate view, though it includes non-buffer errors.Nonzero rate during traffic bursts.
net.core.rmem_max and rmem_defaultThe system-wide ceiling and default for socket receive buffers. If these are at the Linux default of 4 MB, high-pps collectors will overflow.Value below 16 MB on a production trap or flow collector.

Fixes

Increase the socket receive buffer

The Linux default net.core.rmem_max is 4,194,304 bytes (4 MB). This is inadequate for trap receivers handling bursty traffic from many devices.

# Immediate (runtime, non-persistent)
sysctl -w net.core.rmem_max=16777216
sysctl -w net.core.rmem_default=16777216

# Persistent via /etc/sysctl.d/
echo "net.core.rmem_max=16777216" > /etc/sysctl.d/99-snmp-trap.conf
echo "net.core.rmem_default=16777216" >> /etc/sysctl.d/99-snmp-trap.conf
sysctl -p /etc/sysctl.d/99-snmp-trap.conf

Production deployments should target 16 MB or higher. For very high-volume collectors, 33 MB is not unreasonable.

snmptrapd does not expose a socket buffer configuration directive. The underlying net-snmp library does not wire up SO_RCVBUF as a configuration token. The only path to a larger buffer for snmptrapd is raising net.core.rmem_default system-wide, which affects all UDP sockets on the host. This is a trade-off: it fixes the trap receiver but also changes buffer behavior for syslog, flow collectors, and any other UDP listener on the same machine.

If you run a Java-based receiver (SNMP4J, OpenNMS), the JVM defaults its UDP socket receive buffer to a fraction of rmem_max. You must explicitly set SO_RCVBUF at startup or the kernel cap goes unused.

Fix slow traphandle scripts

snmptrapd serializes trap delivery to handler programs. A handler that takes 200ms per trap can only process 5 traps per second. During a burst of 500 traps per second, the socket buffer overflows within milliseconds.

Options:

  • Move expensive processing (database writes, API calls, enrichment) to an asynchronous queue. The handler should write to a local spool or message queue and return immediately.
  • Remove the handler entirely if it is not critical. Use -Lf /var/log/snmptrapd.log to write traps to a file and process them downstream with a separate consumer.
  • If you need real-time trap processing, consider a dedicated trap receiver that handles concurrency natively rather than relying on snmptrapd’s serial model.

Fix NIC ring buffer drops

If Udp_RcvbufErrors is flat but /proc/net/dev shows RX drops, increase the NIC ring buffer.

# Check current and maximum ring buffer settings
ethtool -g eth0

# Increase RX ring to maximum (disruptive: brief link flap possible)
ethtool -G eth0 rx 4096

Warning: ethtool -G may cause a brief interruption on the interface. Schedule during a maintenance window for critical collectors.

Fix RSS misconfiguration

If one core is pinned at 100% softirq while others idle:

# Check IRQ distribution
cat /proc/interrupts | grep eth0

# Check RSS configuration
ethtool -x eth0

The fix depends on your NIC driver and platform. On most modern Intel NICs, ensure RSS is enabled and distributing across multiple queues. Check ethtool -L eth0 for combined channel count. Hypervisor environments, especially nested virtualization, may limit effective ring buffer and RSS settings regardless of guest configuration.

Patch CVE-2025-68615 if applicable

If you are running net-snmp versions prior to 5.9.5, a stack-based buffer overflow vulnerability (CVE-2025-68615, CVSS 9.8) can crash snmptrapd when it receives a crafted SNMP packet with an oversized enterprise OID. This presents as the daemon dying silently and all trap reception stopping. An unauthenticated remote attacker can trigger this by sending a malformed trap to UDP 162. Upgrade to net-snmp 5.9.5 or later. As an interim workaround, firewall UDP 162 to trusted source addresses only.

Prevention

  • Monitor Udp_RcvbufErrors continuously. Any nonzero value on a trap collector is abnormal. This is the signal most teams discover in postmortems, not on dashboards. Track it as a rate, not just a counter.
  • Set rmem_max and rmem_default to 16 MB or higher on all UDP telemetry collectors. The 4 MB default is a known inadequate baseline for production trap, flow, and syslog receivers.
  • Baseline per-device trap volume. A single device dominating trap volume is a finding. Track trap source diversity: if one sender accounts for more than 50% of total traps during normal operation, investigate whether it is a noisy neighbor or a device in early failure.
  • Audit snmptrapd.conf access control after every net-snmp upgrade. The 5.3 access-control enforcement change catches many operators upgrading from 5.2. Traps from devices whose communities are not explicitly authorized are silently discarded.
  • Separate the trap handler from the receiver. If you need downstream processing (SIEM forwarding, enrichment, alerting), use an asynchronous queue between snmptrapd and the consumer. Never let a slow consumer block the daemon’s trap delivery loop.
  • Monitor per-core CPU on the collector. RSS misconfiguration is invisible in aggregate CPU metrics. Always check per-core utilization during burst windows.
  • Track trap inbound rate against device-side export counters where possible. Comparing what the device says it sent against what the collector received is the only reliable end-to-end loss detection method. If the device exported 500 traps and you logged 300, the gap is silent loss.
  • Be aware that trap receivers do not expose per-source drop counts. All drops aggregate to a system-wide counter. You cannot tell which exporter’s traps were dropped. This is a known instrumentation limitation. Plan around it by monitoring the system-wide counter aggressively and correlating gaps with known device events.

How Netdata helps

  • Udp_RcvbufErrors is collected by default as part of Netdata’s IPv4 UDP monitoring. You get a rate chart without any configuration. Alert on any nonzero increment on a host running a trap or flow collector.
  • Per-core CPU is collected at 1-second resolution, including softirq time. RSS misconfiguration that funnels all packet processing to one core is visible immediately in the per-core breakdown.
  • NIC RX/TX drops are collected from /proc/net/dev at the per-interface level, letting you distinguish ring buffer drops from socket buffer drops without manual ethtool sessions.
  • net.core.rmem_max and rmem_default are exposed as system tunables, so you can verify buffer settings across your fleet and detect drift.
  • Correlation across signals matters here. During a link-flap cascade, a trap receiver will show rising Udp_RcvbufErrors alongside a rising trap receive rate and possibly NIC RX drops. Netdata’s unified timeline correlates these signals, accelerating diagnosis compared to checking each counter in isolation.