ClickHouse long-running queries: finding and killing the resource hog
A query that should finish in seconds is still running after twenty minutes. Memory on the ClickHouse node is climbing, query latency has doubled, and you suspect a single query is holding resources it will never release. In ClickHouse, a long-running query can be a legitimate analytical job crunching terabytes, a Cartesian JOIN exploding in memory, or a GROUP BY that has spilled to disk and slowed to a crawl. Telling the difference determines whether you kill it or let it finish.
The live view is system.processes. It shows wall-clock elapsed time, current memory_usage, read_rows, and the query text. High elapsed alone does not mean the query is broken; some workloads expect hours-long execution. The danger is the resource hog that is stalled, spilling, or multiplying rows without meaningful progress. Left alone, it can exhaust memory, saturate disk I/O, or trigger OOM kills that destabilize the node.
What this means
ClickHouse queries execute as a pipeline of processors that hold memory for hash tables, sort buffers, and decompressed blocks. A long-running query keeps those allocations until it finishes or is cancelled. It also holds file descriptors for the parts it reads. A large JOIN or GROUP BY may allocate until it hits the per-query memory limit, then either fail with exception code 241 (MEMORY_LIMIT_EXCEEDED) or spill temporary data to /var/lib/clickhouse/tmp/. Spill-to-disk turns a memory problem into an I/O problem. The query slows by orders of magnitude but continues holding resources.
system.processes.elapsed measures wall-clock time, not CPU time. A query can show high elapsed with almost no CPU use if it is waiting on disk I/O after spilling, waiting on a lock, or blocked by a distributed subquery that is itself stalled. Correlate elapsed with read_rows and memory_usage to separate real work from a runaway process.
flowchart TD
A[Query elapsed high] --> B{read_rows increasing?}
B -->|Yes| C[Legitimate heavy ETL]
B -->|No| D{memory_usage high?}
D -->|Yes| E[Runaway query]
D -->|No| F{tmp directory growing?}
F -->|Yes| G[Spill to disk]
F -->|No| H[Stuck or blocked]Common causes
| Cause | What it looks like | First thing to check |
|---|---|---|
| Cartesian-product JOIN | memory_usage climbs rapidly; query text shows missing or broad JOIN keys | system.processes.memory_usage and the query text for missing ON clauses |
| OOM-induced spill-to-disk | elapsed high, read_rows flat or advancing slowly; temp directory grows | /var/lib/clickhouse/tmp/ size and read_rows delta over 30-60 seconds |
| Full table scan from partition pruning failure | read_rows increases linearly across the entire table, far exceeding expected scope | system.query_log for read_rows vs expected rows from the same query pattern |
| Runaway GROUP BY or DISTINCT | peak_memory_usage is very high; may precede exception code 241 | system.processes.peak_memory_usage vs the user max_memory_usage limit |
| Legitimate heavy ETL | read_rows advances steadily, memory is stable, and the query runs in a known batch window | system.processes.read_rows sampled twice over a minute |
Quick checks
Run these read-only checks to size up the situation.
-- List queries running longer than 60 seconds
SELECT
query_id,
user,
elapsed,
formatReadableSize(memory_usage) AS mem_used,
read_rows,
substring(query, 1, 200) AS query_prefix
FROM system.processes
WHERE elapsed > 60
ORDER BY elapsed DESC
LIMIT 20;
-- Total memory held by all running queries
SELECT
formatReadableSize(sum(memory_usage)) AS total_query_memory,
count(*) AS running_queries
FROM system.processes;
-- Server-level tracked memory
SELECT metric, value
FROM system.metrics
WHERE metric = 'MemoryTracking';
-- Top queries by current memory usage
SELECT
query_id,
user,
elapsed,
formatReadableSize(memory_usage) AS mem_used,
formatReadableSize(peak_memory_usage) AS peak_mem,
read_rows,
substring(query, 1, 200) AS query_prefix
FROM system.processes
ORDER BY memory_usage DESC
LIMIT 10;
-- Sample read_rows for a specific suspect query
-- Run this twice, 30 to 60 seconds apart, to see progress
SELECT query_id, read_rows
FROM system.processes
WHERE query_id = '<query_id>';
# Check temp directory for spill-to-disk activity
du -sh /var/lib/clickhouse/tmp/
-- Recent query failures by exception code
SELECT
exception_code,
count() AS cnt,
any(exception) AS sample
FROM system.query_log
WHERE type = 'ExceptionWhileProcessing'
AND event_time > now() - INTERVAL 10 MINUTE
GROUP BY exception_code
ORDER BY cnt DESC;
How to diagnose it
Identify candidates. Query
system.processeswithelapsed > 60(or your workload threshold). Sort byelapsed DESCto see the oldest queries, and bymemory_usage DESCto see the heaviest consumers.Check progress. Run
SELECT read_rows FROM system.processes WHERE query_id = '...'twice, 30 to 60 seconds apart. Ifread_rowsincreases, the query is actively scanning and may be legitimate heavy ETL. Ifread_rowsis static, the query is stalled.Inspect memory footprint. High
memory_usageorpeak_memory_usagewith flatread_rowssuggests a runaway query, such as a Cartesian product or an unbounded hash table.Look for spill-to-disk. Check
/var/lib/clickhouse/tmp/withdu. If it grows while the query runs and CPU is low, the query exceeded memory limits and is writing temporary aggregation or sort data to disk. This happens whenmax_bytes_before_external_group_byis reached or when the server memory limit is hit.Examine the query text. Use
substring(query, 1, 500)fromsystem.processes. Look for JOINs without equality conditions, missingWHEREclauses on the partition key, orSELECT *across huge tables.Correlate with failures. Query
system.query_logforExceptionWhileProcessingentries with code 241. A cluster of memory errors around the same time suggests the long query is part of a broader memory pressure event.Assess server-level impact. Compare
MemoryTrackingfromsystem.metricsagainst the servermax_server_memory_usagesetting. If the server is above 80% and the long query is the top consumer, it is starving other queries.
Metrics and signals to monitor
| Signal | Why it matters | Warning sign |
|---|---|---|
system.processes.elapsed | Identifies queries exceeding expected duration | Query running > 10x expected duration or > 30 minutes (workload-dependent) |
system.processes.memory_usage | Reveals resource hoarding per query | Single query using > 50% of max_server_memory_usage |
system.processes.read_rows | Distinguishes progress from stagnation | read_rows unchanged over 60+ seconds on a supposedly active query |
system.events.FailedQuery / FailedSelectQuery | Reveals systemic query failures | Sustained error rate > 1% of total queries over 5 minutes |
MemoryTracking vs max_server_memory_usage | Shows server-level memory pressure | MemoryTracking > 80% of limit |
Temp directory size (/var/lib/clickhouse/tmp/) | Indicates spill-to-disk from memory limits | Growth correlating with a long-running query |
system.query_log.query_duration_ms (P99) | Tail latency degradation | Sustained P99 > 2x baseline for > 15 minutes |
Fixes
Kill a runaway or stuck query
Use KILL QUERY WHERE query_id = '<query_id>' on the node where the query is executing. This sends a cancellation signal; the query stops at the next cancellation point. Read-only SELECT queries can usually be killed safely, though the client receives an error.
Warning: Be cautious with INSERT or ALTER operations. Killing an INSERT may leave partial data. Background mutations do not appear in system.processes; use KILL MUTATION WHERE ... in system.mutations instead.
Address spill-to-disk
If the query is slow because it is spilling, choose whether to let it finish or kill it and rewrite. To prevent recurrence, reduce GROUP BY cardinality, add filters, or tune max_bytes_before_external_group_by so spill is controlled rather than catastrophic. Tradeoff: more spill means more disk I/O and slower execution.
Fix the query plan
For full table scans caused by partition pruning failure, rewrite the WHERE clause to include the partition key. For Cartesian JOINs, add explicit ON conditions. See the related guide on full table scans.
Set guardrails with max_execution_time
Apply max_execution_time at the user profile level to cap interactive queries. Legitimate ETL can run under a separate profile with a higher limit or zero. Tradeoff: an overly aggressive limit kills valid long jobs.
Throttle heavy legitimate queries
If the query is valid but resource-intensive, run it under a dedicated user profile with lower max_threads to reduce CPU contention, or schedule it outside peak hours.
Prevention
- Set per-user
max_execution_timeandmax_memory_usage. Interactive users should have a low ceiling; ETL service accounts can have higher limits. This prevents accidental runaways from ever starting. - Review
system.query_logweekly. Look for queries with highread_rowsormemory_usagethat are new or trending upward. Catch pattern changes before they become incidents. - Validate partition key filters in application queries. Missing partition keys are the most common cause of unexpected full scans that turn into long-running resource hogs.
- Monitor
system.processesas a leading indicator. An alert onelapsed > 300catches hogs before they dominate node resources. - Size
max_bytes_before_external_group_bydeliberately. If you rely on spill-to-disk, set it so spills are predictable and do not fill the data disk.
How Netdata helps
- ClickHouse query latency and memory charts shown alongside host CPU, memory, and disk I/O confirm whether a long query is saturating the node.
- Alerts on rising
MemoryTrackingalongside per-query memory spikes detect memory pressure cascades early. - Disk latency charts for the data volume confirm spill-to-disk I/O contention.
FailedQueryand exception code tracking show when long queries start failing across the cluster.
Related guides
- ClickHouse active part count growing: reading MaxPartCountForPartition before it pages
- ClickHouse ALTER UPDATE/DELETE overuse: why mutations are not row updates
- ClickHouse async inserts: when async_insert fixes too-many-parts and when it hides it
- ClickHouse mark cache and uncompressed cache: reading low hit rates
- ClickHouse DelayedInserts climbing: the warning before too-many-parts
- ClickHouse detached parts piling up: reading system.detached_parts and reclaiming space
- ClickHouse disk space collapse: why merges need free space and how the spiral starts
- ClickHouse disk space monitoring: free_space, unreserved_space, and the 80% target
- ClickHouse distributed DDL stuck: ON CLUSTER queries that never finish
- ClickHouse distributed query amplification: one coordinator, many shard subqueries
- ClickHouse full table scan: partition pruning failures and the primary key
- ClickHouse insert latency rising: the leading indicator of write-pipeline trouble







