ClickHouse Memory limit (total) exceeded - server-wide memory pressure and fixes

Code: 241. DB::Exception: Memory limit (total) exceeded: would use X bytes, current RSS Y, maximum Z. is the server-level cap, not a per-query limit. When ClickHouse’s MemoryTracking hits max_server_memory_usage (default 90% of physical RAM), the server kills the heaviest running queries to protect the process. New and existing queries fail until memory drops. Find the largest memory consumer and stop it before the OOM killer does.

Unlike a per-query max_memory_usage breach, the (total) variant means the aggregate across all queries, background merges, caches, and dictionaries has crossed the server ceiling. It often arrives suddenly: a runaway JOIN, a burst of concurrent heavy GROUP BY queries, or a background merge spike pushes the tracked total over the line. ClickHouse uses a hierarchical memory tracker (server to user to query), and the server limit is the final backstop.

This error does not mean you need more RAM. It means the current workload is unsafe for the configured limit. Diagnose whether the pressure comes from a single query, concurrency, or background work, and recover without restarting the server.

What this means

ClickHouse tracks allocations through a hierarchy: a server-level MemoryTracker, then per-user trackers, then per-query trackers. The server limit is controlled by max_server_memory_usage. Left at default, ClickHouse derives the limit from physical RAM and caps tracked memory at 90%. That value lives in system.metrics as MemoryTracking.

When MemoryTracking approaches that ceiling, the server kills the heaviest queries to free memory. The error message Memory limit (total) exceeded is exception code 241. It is distinct from a query-level breach, which references per-query max_memory_usage instead of the server total.

The tracked value is an approximation. Resident Set Size (RSS) reported by the OS can exceed MemoryTracking because of allocator overhead, memory-mapped files, and untracked allocations. Relying on MemoryTracking alone can miss fragmentation that still exposes the process to the OOM killer.

flowchart TD
    A[Physical RAM] --> B[max_server_memory_usage]
    B --> C[Server MemoryTracker]
    C --> D[User tracker]
    C --> E[Background merges and caches]
    D --> F[Query A]
    D --> G[Query B]
    C -.-> H[Kill heaviest query when limit exceeded]

Common causes

CauseWhat it looks likeFirst thing to check
Runaway query with bad JOIN or huge GROUP BYOne query in system.processes has peak_memory_usage far above the rest, often with long elapsed timesystem.processes ordered by memory_usage DESC
Concurrent heavy query stormMany queries each using moderate memory; sum(memory_usage) from system.processes is near the server limitTotal query memory versus MemoryTracking
Background merge or mutation spikeMerge memory_usage in system.merges is elevated while active query count is lowSELECT sum(memory_usage) FROM system.merges
Cache or dictionary growthMemoryTracking climbs steadily without heavy queries; dictionaries may be reloadingsystem.dictionaries and cache metrics in system.asynchronous_metrics
Spill-to-disk feedback loop/var/lib/clickhouse/tmp/ is growing; query latency rises as I/O substitutes for RAMTemp directory size and query_log for spilling patterns

Quick checks

# Check OS-level resident memory for the ClickHouse process
for pid in $(pgrep clickhouse-server); do echo "PID $pid"; grep -E 'VmRSS|VmSize|VmPeak' /proc/$pid/status; done
-- Server-level tracked memory and OS-level resident memory
SELECT
    (SELECT value FROM system.metrics WHERE metric = 'MemoryTracking') AS tracked_bytes,
    (SELECT formatReadableSize(value) FROM system.asynchronous_metrics WHERE metric = 'MemoryResident') AS rss,
    (SELECT formatReadableSize(value) FROM system.asynchronous_metrics WHERE metric = 'OSMemoryAvailable') AS os_available
-- Heaviest running queries by current memory
SELECT
    query_id,
    user,
    elapsed,
    formatReadableSize(memory_usage) AS current_mem,
    formatReadableSize(peak_memory_usage) AS peak_mem,
    substring(query, 1, 120) AS query_prefix
FROM system.processes
WHERE memory_usage > 0
ORDER BY memory_usage DESC
LIMIT 10;
-- Total memory held by all running queries
SELECT formatReadableSize(sum(memory_usage)) AS total_query_memory FROM system.processes;
-- Total memory held by background merges
SELECT formatReadableSize(sum(memory_usage)) AS total_merge_memory FROM system.merges;
-- Recent MEMORY_LIMIT_EXCEEDED errors
SELECT
    event_time,
    query_id,
    user,
    substring(query, 1, 120) AS query_prefix
FROM system.query_log
WHERE type = 'ExceptionWhileProcessing'
  AND exception_code = 241
  AND event_time > now() - INTERVAL 10 MINUTE
ORDER BY event_time DESC;
# Check for spill-to-disk activity
ls -lh /var/lib/clickhouse/tmp/ 2>/dev/null | head -20

How to diagnose it

  1. Confirm the server limit is the problem. Query system.metrics for MemoryTracking. If it is near max_server_memory_usage and the error says (total), you are at the server cap.
  2. Classify the consumer. Compare total query memory (sum from system.processes) against total merge memory (sum from system.merges). If queries dominate, the fix is query-side. If merges dominate, the pressure is background workload.
  3. Find the runaway query. Sort system.processes by memory_usage DESC. Note query_id, user, elapsed, and peak_memory_usage. A single query using more than 50% of the server limit is the obvious target, and it is likely the one the server will kill first.
  4. Check for a concurrency pile-up. If no single query is huge but the sum of query memory exceeds 70% of the server limit, too many concurrent heavy queries are the issue.
  5. Inspect recent exceptions. Query system.query_log for exception_code = 241 in the last 10 minutes. If the same query pattern repeats, it is a bad query. If many distinct queries fail, the system is universally saturated.
  6. Look for spill-to-disk. If max_bytes_before_external_group_by is configured, queries may spill to /var/lib/clickhouse/tmp/ instead of failing. Large temp files mean memory pressure has already forced queries onto disk, which can starve merges of I/O and worsen the cascade.
  7. Compare RSS to tracked memory. Check system.asynchronous_metrics MemoryResident against MemoryTracking. If RSS is significantly higher, untracked allocations or allocator fragmentation are contributing. This explains why the OOM killer may strike before ClickHouse throttles queries.
  8. If queries and merges do not explain the pressure, check system.dictionaries for recent reloads and review mark cache size in system.asynchronous_metrics. Large dictionaries and oversized caches can consume server-level memory outside of query execution.

Metrics and signals to monitor

SignalWhy it mattersWarning sign
MemoryTracking / max_server_memory_usageTracked allocation approaching the server capTracked memory > 80% of limit
MemoryResident (RSS)Actual OS memory; the OOM killer sees this, not the internal trackerRSS > 90% of physical RAM
Per-query memory_usageIdentifies runaway queries before they threaten the serverSingle query > 50% of server limit
Sum of query memory_usageReveals concurrency-driven pressureTotal query memory > 70% of server limit
Merge memory_usage from system.mergesBackground merges can consume significant RAM during large consolidationsMerge memory unexpectedly high while parts accumulate
Exception code 241 in query_logEarly indicator of memory pressure trendSustained rate above baseline
OSMemoryAvailableSystem-level headroom before OOMAvailable memory < 5% of total
Spill-to-disk temp directory growthSignals queries are trading memory for I/O, which can cascade into merge starvation/var/lib/clickhouse/tmp/ growing steadily

Fixes

Kill the runaway query

Use KILL QUERY WHERE query_id = '...' on the heaviest consumer from system.processes. This is the fastest way to drop memory usage. The tradeoff is lost work. For ad-hoc analytics this is usually acceptable. For ETL jobs, coordinate with the owner before killing.

Reduce concurrent query load

If many moderate queries exhaust memory together, lower max_concurrent_queries temporarily to force queuing instead of simultaneous execution. This prevents aggregate memory from spiking. The tradeoff is increased query latency, but it keeps the server stable.

Enforce per-query memory limits

Set max_memory_usage in user profiles so individual queries hit their own ceiling before threatening the server limit. The tradeoff is that legitimate large queries fail earlier, but they fail predictably and do not destabilize the node.

Tune spill-to-disk behavior

If you configure max_bytes_before_external_group_by, queries spill rather than fail. When spill volume in /var/lib/clickhouse/tmp/ grows, disk I/O rises and merges slow down. Raise the threshold only if you have proven I/O headroom, or lower it to fail fast and force query rewriting. The tradeoff is query failure versus system-wide I/O saturation.

Reduce cache or dictionary memory

If mark cache or uncompressed cache is oversized relative to RAM, reduce mark_cache_size or uncompressed_cache_size in the server configuration. If large dictionaries reload during peak hours, reschedule reloads or reduce dictionary size. The tradeoff is more disk I/O on cold queries.

Optimize the underlying queries

Fix Cartesian products by adding proper JOIN conditions. Add LIMIT to high-cardinality GROUP BY queries. Reduce DISTINCT on large datasets. These changes reduce peak_memory_usage at the source. This is the only fix that does not trade capacity for stability, but it requires development time.

Prevention

  • Configure max_memory_usage per user profile so analytical users cannot allocate unbounded memory.
  • Set max_concurrent_queries based on worst-case memory: typical heavy query memory multiplied by desired concurrency should stay below 70% of max_server_memory_usage.
  • Monitor the 95th percentile of MemoryTracking over 5-minute windows. If the peak trends upward over days, investigate before it intersects the limit.
  • Require EXPLAIN or dry-run review for new JOIN and GROUP BY patterns against large tables.
  • Alert when MemoryResident exceeds MemoryTracking by a large ratio. This catches allocator fragmentation and untracked growth that the internal tracker misses.
  • Disable or minimally configure swap. ClickHouse on swap is effectively unresponsive.

How Netdata helps

Netdata tracks the clickhouse-server process RSS alongside system-level memory metrics. Compare RSS against ClickHouse’s internal MemoryTracking to spot divergence before the OOM killer intervenes.

The ClickHouse collector exposes MemoryTracking, query counts, and exception rates. Correlate spikes in code 241 with query concurrency and memory metrics.

Disk latency and I/O utilization charts show when spill-to-disk from memory pressure begins starving background merges.

Process memory charts show whether clickhouse-server RSS climbs independently of tracked query memory, indicating untracked allocation or cache growth.

Query latency metrics help spot the performance degradation that precedes explicit memory limit errors.