BGP NOTIFICATION and Cease messages: what each subcode is telling you

A BGP NOTIFICATION in your router log is a peer telling you why it tore down the session. The message carries an error code and an error subcode. Those two numbers tell you whether you are looking at a maintenance window, a route leak, a prefix-limit hit, a CPU-starved control plane, or a BFD-triggered teardown.

Cease (code 6) is the most common NOTIFICATION. Its subcodes, defined in RFC 4486 and extended by RFC 8538 and RFC 9384, hold most of the diagnostic value. Codes 2 through 5 appear less often but point to distinct failure classes: parameter mismatch, malformed updates, hold-timer expiry, and FSM errors.

NOTIFICATION message structure

Every BGP NOTIFICATION message has three fields:

  • Error Code (1 byte): the broad category of error.
  • Error Subcode (1 byte): a more specific reason within that category.
  • Data (variable length): diagnostic payload whose format depends on the error code.

Router logs render this as “code/subcode” with a human-readable description. For example, %BGP-3-NOTIFICATION: received from neighbor x.x.x.x active 6/1 (cease: max-prefixes reached) means error code 6 (Cease), subcode 1 (Maximum Number of Prefixes Reached). The subcode is where you look first.

Error codes at a glance

The IANA BGP Parameters registry defines nine error codes (excluding reserved). Most production incidents involve codes 2 through 6.

CodeNameWhat it means operationallyRFC
1Message Header ErrorConnection-level framing problem. Rare in modern implementations.4271
2OPEN Message ErrorBGP parameters mismatched during session setup. Common after config changes.4271
3UPDATE Message ErrorPeer sent a malformed or policy-violating route. Often a bad AS-path or invalid attribute.4271
4Hold Timer ExpiredPeer did not receive keepalives within the negotiated hold time. Frequently indicates control-plane CPU saturation.4271
5Finite State Machine ErrorPeer received an unexpected message for its current FSM state. Usually a software bug or race condition.4271
6CeasePeer intentionally terminated the session. Subcode carries the reason. Most common code in production.4271
7ROUTE-REFRESH Message ErrorMalformed route-refresh request. Rare unless route-refresh is heavily used.7313
8Send Hold Timer ExpiredLocal speaker failed to send within the hold interval. Distinct from code 4 (receive side).9687
9Loss of LSDB SynchronizationBGP-LS deployments only. Not applicable to conventional BGP peering.9815

Codes 8 and 9 are recent additions. Code 8 distinguishes a send-side timeout from the classic receive-side Hold Timer Expired (code 4). Code 9 applies to BGP-LS.

Cease subcodes decoded

Cease (code 6) carries a subcode that tells you why the peer tore down the session. The original eight subcodes come from RFC 4486. Subcode 9 (Hard Reset) was added by RFC 8538, and subcode 10 (BFD Down) was added by RFC 9384.

SubcodeNameRFCWhat triggered it
1Maximum Number of Prefixes Reached4486Peer exceeded the configured prefix limit. Could be a route leak or organic growth.
2Administrative Shutdown4486, 8203Peer intentionally shut down the session, typically for maintenance.
3Peer De-configured4486Peer removed your configuration on their end.
4Administrative Reset4486, 8203Peer reset the session, usually after a policy change.
5Connection Rejected4486Peer refused the TCP connection. Often a policy or peer-group mismatch.
6Other Configuration Change4486Peer changed policy that does not fit subcodes 2 through 5.
7Connection Collision Resolution4486Two simultaneous connection attempts resolved by closing one. Benign.
8Out of Resources4486Peer ran out of memory or other resources.
9Hard Reset8538Peer demands a full session reset, defeating Graceful Restart.
10BFD Down9384Associated BFD session went down, triggering BGP teardown.

Subcode 0 (Reserved) appears when no specific subcode applies. RFC 4271 defines it as “Unspecific.” Some vendors log it as-is; others substitute a generic description.

flowchart TD
    A["NOTIFICATION received"] --> B{"Error code?"}
    B -->|"6 Cease"| C["Decode subcode"]
    B -->|"2 OPEN"| D["AS, MD5, capability mismatch"]
    B -->|"3 UPDATE"| E["Malformed path attribute"]
    B -->|"4 Hold Timer"| F["Control-plane CPU saturation"]
    C --> G{"Cease subcode?"}
    G -->|"1 Max Prefix"| H["Route leak or growth"]
    G -->|"2/4 Admin"| I["Maintenance or policy"]
    G -->|"9/10"| J["Hard reset or BFD down"]

What each Cease subcode means in practice

Subcode 1: Maximum Number of Prefixes Reached. A peer is sending more prefixes than your configured maximum-prefix limit allows. Two scenarios: organic growth that exceeded a stale limit, or a route leak where the peer is advertising prefixes they should not. The session is torn down and all routes from that peer are withdrawn. Check per-peer prefix-count trends to determine whether this was gradual (growth) or sudden (leak). If you see RPKI-invalid routes from the same peer around the same time, treat it as a potential route leak.

Subcode 2: Administrative Shutdown. The peer intentionally brought the session down, typically for planned maintenance. RFC 8203 adds an optional UTF-8 shutdown communication string (up to 128 octets) that the receiving implementation must log via syslog. If your peer supports RFC 8203, the log line will contain a freeform reason such as a ticket number. Check your change management system. If there is no change ticket, this could be an emergency shutdown on the peer side.

Subcode 3: Peer De-configured. The peer removed your BGP configuration entirely. This is not a transient event. Someone on the peer side deleted or commented out your neighbor statement. Contact the peer’s NOC.

Subcode 4: Administrative Reset. Similar to subcode 2 but typically triggered by a policy change rather than a full shutdown. The peer applied a new route-map, changed import or export policy, or reloaded BGP configuration. RFC 8203 shutdown communication also applies to this subcode. If this happens outside a maintenance window, investigate whether the peer changed filtering policy that affects your routes.

Subcode 5: Connection Rejected. The peer refused the incoming TCP connection. On Juniper devices, this often appears with a log line like “no group for IP+port from AS X found”, meaning the connection arrived from an address not belonging to a configured peer group. This is a configuration ordering issue on the peer side, not a protocol error. Verify that the peer’s configuration references the correct source IP for your router.

Subcode 6: Other Configuration Change. A catch-all for policy changes that do not fit subcodes 2 through 5. Less specific, but still indicates a deliberate change on the peer side. Check with the peer’s NOC.

Subcode 7: Connection Collision Resolution. Both sides initiated TCP connections simultaneously, and BGP collision detection resolved the duplicate by keeping one and closing the other. This is normal protocol behavior. No action needed unless it recurs frequently, which may indicate a timer or topology issue causing both sides to reconnect at the same time.

Subcode 8: Out of Resources. The peer ran out of memory, TCAM, or another finite resource. This is a peer-side capacity problem. Correlate with the peer’s control-plane CPU and memory if you have visibility. If this recurs, the peer may need a hardware upgrade or RIB optimization.

Subcode 9: Hard Reset (RFC 8538). The peer demands a full session reset with no Graceful Restart assistance. The triggering NOTIFICATION is encapsulated in the data portion of the Hard Reset message. Upon receipt, the receiving speaker must flush all routes from that peer and perform a complete session reset. This explicitly defeats any Graceful Restart helper behavior. FRRouting had a bug (issue #21822) where the GR helper incorrectly retained stale routes on receipt of Cease(6) or Hard Reset(9). If you run FRR, verify your version includes the fix.

Subcode 10: BFD Down (RFC 9384). The peer tore down the BGP session because the associated BFD session went Down. RFC 9384 makes this a SHOULD, not a MUST. Some implementations send generic Cease without subcode 10 when BFD brings down the session. The underlying BFD failure is the real event. Investigate BFD session state on both endpoints. BFD down typically means path loss or excessive latency and jitter that exceeded BFD thresholds.

Beyond Cease: error codes you will see in production

Code 2: OPEN Message Error. Session setup failed during parameter negotiation. Common causes: wrong remote AS number, MD5 authentication key mismatch, unsupported capabilities, or hold-time disagreement. These almost always indicate a configuration error on one side. If this appears after a key rotation or config change, check the AS number, MD5 key, and address-family or capability negotiation settings. RFC 9234 (2022) deprecated OPEN subcodes 8, 9, and 10 and added subcode 11 (Role Mismatch) for BGP role conflict scenarios.

Code 3: UPDATE Message Error. The peer sent a route with a malformed or invalid attribute. This could be a bad AS-path (including loops detected by the receiver), an invalid ORIGIN, a malformed NLRI, or an optional transitive attribute the receiver could not parse. The data field of the NOTIFICATION identifies the offending attribute. This is a peer-side bug or a route leak with malformed attributes. If it recurs, capture the UPDATE and report it to the peer.

Code 4: Hold Timer Expired. The peer did not receive keepalives or UPDATE messages within the negotiated hold time. In production, the most common root cause is control-plane CPU saturation on the peer. When CPU is pegged, BGP keepalive generation starves. Check the peer’s control-plane CPU. If you have SNMP visibility, poll cpmCPUTotal5sec on Cisco or the vendor equivalent. Correlate with any concurrent SNMP polling that might be contributing to CPU load.

Code 5: Finite State Machine Error. The peer received a BGP message it did not expect in its current FSM state. This is almost always a software bug, a race condition during session establishment, or a duplicate connection. Rare in stable production environments. If it recurs, capture debug output and report it to the vendor.

How to retrieve the last NOTIFICATION

The BGP4-MIB (RFC 4273) exposes the last error per peer via bgpPeerLastError at OID .1.3.6.1.2.1.15.3.1.14. This OCTET STRING encodes the error code and subcode from the last NOTIFICATION received from that peer. The value persists until the next NOTIFICATION or a session reset that clears it.

# Retrieve last error for all peers via SNMP
snmpwalk -v2c -c <community> <router> .1.3.6.1.2.1.15.3.1.14

# Vendor CLI: show the last notification received from a peer
ssh <router> 'show ip bgp neighbors <peer> | include notification|Last'

# Check syslog for recent BGP NOTIFICATION messages
ssh <router> 'show logging | include BGP'

# FRRouting equivalent
vtysh -c 'show bgp neighbors <peer>'

The bgpBackwardTransition notification trap (also defined in RFC 4273) fires on session state transitions. If your trap receiver is configured to accept BGP traps, the trap payload includes the old and new state plus the peer address. Traps are UDP and can be dropped under load. For reliability, pair trap monitoring with periodic SNMP polling of bgpPeerState at .1.3.6.1.2.1.15.3.1.2.

Monitoring signals to correlate with NOTIFICATION messages

A NOTIFICATION tells you what happened. These signals tell you why.

SignalWhy it mattersWarning sign
Per-peer prefix countSudden increase before Cease/1 indicates a route leak or growth past the limitPrefix count rising sharply in minutes before the NOTIFICATION
Control-plane CPUHold Timer Expired (code 4) is frequently CPU-inducedCPU above 90% sustained before the session drop
RPKI/ROA validation stateInvalid routes from the peer confirm a leak or hijackRPKI-invalid count above zero from the same peer
BFD session stateSubcode 10 traces back to a BFD failureBFD session in Down state at the same timestamp
bgpPeerInUpdates rateEstablished with zero updates means stale routingUpdate rate flat for an extended period without a session state change
Syslog severity distributionNOTIFICATION messages often arrive in clusters during incidentsMultiple BGP events from the same peer within minutes

How Netdata helps

  • SNMP-based BGP session monitoring: Netdata polls bgpPeerState per peer and can alert on transitions out of Established. Pair this with bgpPeerLastError to surface the code and subcode without parsing syslog.
  • Syslog ingestion and parsing: Netdata’s syslog collector captures BGP NOTIFICATION messages as they arrive, letting you correlate the exact timestamp with interface state, CPU, and prefix-count changes on the same timeline.
  • Control-plane CPU correlation: When Hold Timer Expired fires, Netdata shows the CPU trend alongside the session drop, making the root cause visible in seconds.
  • Prefix-count trends: Per-peer prefix counts tracked over time reveal whether Cease/1 was a sudden leak or gradual growth that crossed a threshold.
  • Trap collection: The SNMP trap receiver catches bgpBackwardTransition notifications, providing push-based session transition alerts alongside polled state.