Vendor API 429 throttling: Meraki, Cato, and PAN-OS rate limits
Meraki, Cato, or PAN-OS API-polled devices go dark in your dashboard while ICMP and SNMP to the same devices return healthy responses. Every device sourced from the same vendor API flatlines at the same timestamp. The cause: your collector exhausted the vendor API rate-limit budget and is now receiving HTTP 429 instead of data.
This pattern is frequently misdiagnosed because the symptom (devices appearing “down”) sits two layers above the cause (rate limit exhausted). It surfaces most often during incidents, when teams tighten polling intervals for faster data, or silently when multiple tools share a single API key without coordination.
What this means
HTTP 429 means the vendor refused your request because you exceeded the rate limit for the current window. Each vendor enforces limits differently:
Meraki Dashboard API: Two independent dimensions - throughput (10 req/sec per organization, burst of 30 in the first 2 seconds) and concurrency (10 concurrent requests per IP). The rate-limiting key is the source IP. A 429 response includes a Retry-After header. Some administrative endpoints have stricter limits: 10 requests per 5-minute window per IP.
Cato GraphQL API: Per-query, per-account limits. General floor: 120 requests/minute. Specific queries have lower ceilings: accountSnapshot at 1/sec, accountMetrics at 15/min, eventsFeed at 100/min. Two users issuing different query names do not share a counter, but two users issuing the same query name share the counter across all API keys on that account. Cato does not formally publish rate-limit response headers; verify empirically.
PAN-OS XML/REST API: No published per-second rate. Palo Alto recommends a maximum of 5 concurrent API calls per firewall or Panorama. Exceeding this does not always produce a 429; it degrades the management-plane web server, which serves both API and web UI requests. Prisma Cloud CSPM exposes X-RateLimit-Remaining and related headers keyed per user.
Common causes
| Cause | What it looks like | First thing to check |
|---|---|---|
| Multiple collectors sharing one API key | All API-polled devices from one vendor go dark simultaneously; no single collector exceeds the limit alone | Count every tool, script, or integration using the same key |
| Polling frequency increased during incident | 429s start shortly after a scrape interval change or ad-hoc query burst | Check collector config history; correlate 429 onset with the change |
| Runaway script or automation loop | Sudden burst of 429s with high request volume from one source | Check vendor audit trail for request volume anomalies |
| Per-query collision (Cato) | One Cato query type throttles while others remain healthy | Identify which GraphQL query name hits its per-query limit |
| Concurrency exhaustion (Meraki) | Slow responses exhaust the 10-connection cap while staying under 10 req/sec | Check collector worker pool size; look for slow API responses |
| API key rotation or expiry | 401/403 responses mixed with 429s; some endpoints work, others fail | Verify key validity; SAML/SSO admins on Meraki cannot generate API keys |
Quick checks
# Check Meraki rate-limit headers (read-only; single org list call)
curl -sI -H "Authorization: Bearer $MERAKI_KEY" \
https://api.meraki.com/api/v1/organizations | grep -i 'ratelimit\|retry'
# Check Cato response headers for rate-limit indicators (requires valid POST)
curl -s -D - -o /dev/null \
-H "x-api-key: $CATO_KEY" \
-H "Content-Type: application/json" \
-d '{"query":"{ __typename }"}' \
https://api.catonetworks.com/api/v1/graphql2 | grep -i 'ratelimit\|retry'
# Check PAN-OS API responsiveness and management-plane health
curl -sk "https://$FW_HOST/api/?type=op&cmd=<show><system><info></info></system></show>&key=$PAN_KEY" \
-w "\nHTTP: %{http_code} Time: %{time_total}s\n"
# Verify ICMP still works to the same devices (should succeed if only API is throttled)
ping -c 3 -i 0.2 <device-ip>
# Check if SNMP still returns data (should succeed if only API is throttled)
snmpget -v2c -c <community> <device> .1.3.6.1.2.1.1.3.0
# Probe actual HTTP status codes from the vendor API over a short window
# Note: each call consumes API quota
for i in $(seq 1 10); do
curl -s -o /dev/null -w "%{http_code}\n" \
-H "Authorization: Bearer $MERAKI_KEY" \
https://api.meraki.com/api/v1/organizations
sleep 0.5
done
# Verify PAN-OS API key validity (invalid key returns error inside HTTP 200 body)
curl -sk "https://$FW_HOST/api/?type=op&cmd=<show><system><info></info></system></show>&key=$PAN_KEY" | head -5
How to diagnose it
flowchart TD
A[API-polled devices go dark] --> B{ICMP/SNMP still working?}
B -- No --> C[Network or device outage]
B -- Yes --> D{HTTP 429 in collector logs?}
D -- No --> E[Check 401/403: key rotation]
D -- Yes --> F{Which vendor?}
F -- Meraki --> G[Check Retry-After header]
F -- Cato --> H[Identify throttled query name]
F -- PAN-OS --> I[Check concurrent connection count]
G --> J[Audit key consumers and poll cadence]
H --> J
I --> J
J --> K[Reduce consumption or shard API keys]Confirm the scope. Only API-polled devices affected? Run the ICMP and SNMP quick checks above. If both return healthy, the network and devices are fine; the problem is between your collector and the vendor API.
Identify the HTTP status code. Inspect collector logs or run the manual curl loop. A stream of 429 confirms throttling. A mix of 401 or 403 suggests key rotation or expiry. Meraki returns 404 (not 403) on a bad API key by design, to avoid leaking resource existence. PAN-OS returns
<response status="error">inside an HTTP 200 body, so HTTP status alone is misleading.Identify the throttling dimension. For Meraki, check the
Retry-Afterheader on the 429 response. For Cato, identify which GraphQL query name is hitting its limit: the general 120/min applies broadly, but specific queries likeaccountSnapshot(1/sec) oraccountMetrics(15/min) have much lower floors. For PAN-OS, check whether you have more than 5 concurrent API calls in flight to the same firewall or Panorama.Audit key consumers. The most common root cause is a shared API key consumed by multiple tools with no coordination. Enumerate every system that uses the vendor API key: the NMS, automation scripts, third-party integrations, ad-hoc dashboards, and CI/CD pipelines. For Cato specifically, same-named queries share a counter across all API keys on an account, so splitting keys alone does not solve per-query collisions.
Check for recent polling changes. Correlate the onset of 429s with collector configuration changes. Did someone decrease the poll interval? Add a batch of new devices? Start a new API-driven compliance report?
Metrics and signals to monitor
| Signal | Why it matters | Warning sign |
|---|---|---|
| HTTP 429 rate from vendor API | Direct indicator of active throttling | Any sustained rate above 0 |
Retry-After header (Meraki) | How long the vendor wants you to back off | Presence means you are throttled |
| API request latency | Rising latency often precedes 429s as the vendor applies backpressure | p99 latency greater than 5x baseline |
| API rate-limit remaining (where exposed) | Leading indicator before the throttle cliff | Below 20% of quota per window |
| Data freshness for API-sourced metrics | Staleness is the downstream symptom of throttling | Time since last successful poll greater than 2x poll interval |
X-RateLimit-Remaining (Prisma Cloud CSPM) | Per-user remaining budget exposed in headers | Value at 0 means throttled |
| PAN-OS management-plane web UI responsiveness | API over-consumption degrades the shared web server process | Web UI sluggish when API calls are in flight |
Fixes
Reduce polling frequency
Increase the poll interval for API-sourced data to stay below 70% of the documented limit. For Meraki, that means staying well under 10 req/sec/org across all consumers of that key. For Cato, calculate per-query consumption: if you poll accountSnapshot every second, you are at the 1/sec ceiling with zero headroom for any other consumer. Back off to every 5 to 10 seconds.
Shard API keys
If multiple tools share one API key, assign separate keys per tool. Meraki allows multiple API keys per organization, generated by different dashboard admins. For Cato, same-named queries share a counter across all keys on an account, so sharding keys alone does not resolve per-query collisions. Namespace your queries or stagger identical polling schedules across consumers.
Implement exponential backoff with Retry-After
Respect the Retry-After header on 429 responses. The Meraki Python SDK (meraki/dashboard-api-python) performs automatic retries on 429 by reading this header. Custom integrations must not assume the header is always present; implement a fallback fixed-interval retry when it is absent. For Cato, wait a few minutes and resume after a 429.
Limit concurrency for PAN-OS
Keep in-flight PAN-OS API calls at or below 5 at any time. Exceeding this degrades the management-plane web server and can cause request failures that look like timeouts. If you need higher throughput, use Panorama as an aggregation point and batch requests. For PAN-OS API key management, keys are generated via /api/?type=keygen&user=<user>&password=<password>. This exposes credentials in the URL; prefer generating keys via the web UI when possible. The key is passed as the X-PAN-KEY header or key= query parameter.
Collapse redundant queries
Third-party monitoring templates that issue hundreds of API calls per scan interval, for example pulling per-device metrics across thousands of Meraki devices, routinely exhaust the budget. Replace these with lightweight org-level API calls that return aggregate status. For Cato, consolidate multiple accountMetrics calls into fewer broader queries rather than issuing many narrow ones.
Prevention
- Track API consumption proactively. Monitor the rate of API calls per key per minute against the documented limit, not just the 429 count. Catch consumption at 70% of budget, not at 100%.
- Alert on data freshness, not just errors. When the API is throttled, the collector may stop logging errors and simply stop updating. Track time since last successful API response per vendor.
- Document key ownership. Every API key should have a named owner and a list of consuming systems. When a new tool needs vendor API access, it gets its own key.
- Budget for incident surges. Set steady-state consumption low enough (below 50% of limit) that a 2x surge during an incident does not trigger throttling.
- Validate response bodies, not just HTTP status. PAN-OS returns
<response status="error">inside HTTP 200. A collector that checks only the status code will miss this failure and treat the response as successful with no data. - Watch for vendor-side tightening. Meraki tightened limits on some administrative endpoints in 2023-2024. Operators relying on historical polling cadences can hit unexpected 429s after vendor-side changes.
How Netdata helps
- Correlate API data freshness with device-level signals. Netdata charts the time since last successful API response alongside ICMP reachability and SNMP data, making it immediately visible when only the API layer is degraded while the network path is healthy.
- Track HTTP 429 rates as a first-class metric. A dedicated chart for vendor API error rates per vendor lets you spot throttling before it causes data gaps.
- Monitor collector-side resource pressure. CPU spikes, worker thread saturation, or queue depth on the collector host can indicate that the collector is over-consuming API budgets.
- Multi-vendor signal correlation. When Meraki, Cato, and PAN-OS API health are charted together, a single-vendor throttling event is immediately distinguishable from a broader connectivity issue or a collector host problem.
- Alert on staleness thresholds. Configurable alerts on data freshness degradation catch the silent gap that occurs when the collector receives 429s but stops logging them as errors.
Related guides
- ARP cache staleness: when IP-to-MAC mapping goes bad
- Asymmetric routing: why your path and latency measurements lie
- Audit log gaps: detecting syslog/trap tampering or loss
- BGP flapping: why a peer keeps resetting and how to find the cause
- BGP NOTIFICATION and Cease messages: what each subcode is telling you
- BGP RIB and FIB growth: monitoring route-table size before it bites
- BGP route leak and hijack: the detection signals and alerts that matter
- BGP session Established but stale: detecting silent route loss
- Cold-start topology: why your map is incomplete after a collector restart
- Locating endpoints behind NAT and wireless: the positioning problem
- Stale FDB/MAC tables: why endpoint location is wrong
- NetFlow storage sizing: how much disk your flow collector really needs







