ClickHouse long-running queries: finding and killing the resource hog

A query that should finish in seconds is still running after twenty minutes. Memory on the ClickHouse node is climbing, query latency has doubled, and you suspect a single query is holding resources it will never release. In ClickHouse, a long-running query can be a legitimate analytical job crunching terabytes, a Cartesian JOIN exploding in memory, or a GROUP BY that has spilled to disk and slowed to a crawl. Telling the difference determines whether you kill it or let it finish.

The live view is system.processes. It shows wall-clock elapsed time, current memory_usage, read_rows, and the query text. High elapsed alone does not mean the query is broken; some workloads expect hours-long execution. The danger is the resource hog that is stalled, spilling, or multiplying rows without meaningful progress. Left alone, it can exhaust memory, saturate disk I/O, or trigger OOM kills that destabilize the node.

What this means

ClickHouse queries execute as a pipeline of processors that hold memory for hash tables, sort buffers, and decompressed blocks. A long-running query keeps those allocations until it finishes or is cancelled. It also holds file descriptors for the parts it reads. A large JOIN or GROUP BY may allocate until it hits the per-query memory limit, then either fail with exception code 241 (MEMORY_LIMIT_EXCEEDED) or spill temporary data to /var/lib/clickhouse/tmp/. Spill-to-disk turns a memory problem into an I/O problem. The query slows by orders of magnitude but continues holding resources.

system.processes.elapsed measures wall-clock time, not CPU time. A query can show high elapsed with almost no CPU use if it is waiting on disk I/O after spilling, waiting on a lock, or blocked by a distributed subquery that is itself stalled. Correlate elapsed with read_rows and memory_usage to separate real work from a runaway process.

flowchart TD
    A[Query elapsed high] --> B{read_rows increasing?}
    B -->|Yes| C[Legitimate heavy ETL]
    B -->|No| D{memory_usage high?}
    D -->|Yes| E[Runaway query]
    D -->|No| F{tmp directory growing?}
    F -->|Yes| G[Spill to disk]
    F -->|No| H[Stuck or blocked]

Common causes

CauseWhat it looks likeFirst thing to check
Cartesian-product JOINmemory_usage climbs rapidly; query text shows missing or broad JOIN keyssystem.processes.memory_usage and the query text for missing ON clauses
OOM-induced spill-to-diskelapsed high, read_rows flat or advancing slowly; temp directory grows/var/lib/clickhouse/tmp/ size and read_rows delta over 30-60 seconds
Full table scan from partition pruning failureread_rows increases linearly across the entire table, far exceeding expected scopesystem.query_log for read_rows vs expected rows from the same query pattern
Runaway GROUP BY or DISTINCTpeak_memory_usage is very high; may precede exception code 241system.processes.peak_memory_usage vs the user max_memory_usage limit
Legitimate heavy ETLread_rows advances steadily, memory is stable, and the query runs in a known batch windowsystem.processes.read_rows sampled twice over a minute

Quick checks

Run these read-only checks to size up the situation.

-- List queries running longer than 60 seconds
SELECT
    query_id,
    user,
    elapsed,
    formatReadableSize(memory_usage) AS mem_used,
    read_rows,
    substring(query, 1, 200) AS query_prefix
FROM system.processes
WHERE elapsed > 60
ORDER BY elapsed DESC
LIMIT 20;
-- Total memory held by all running queries
SELECT
    formatReadableSize(sum(memory_usage)) AS total_query_memory,
    count(*) AS running_queries
FROM system.processes;
-- Server-level tracked memory
SELECT metric, value
FROM system.metrics
WHERE metric = 'MemoryTracking';
-- Top queries by current memory usage
SELECT
    query_id,
    user,
    elapsed,
    formatReadableSize(memory_usage) AS mem_used,
    formatReadableSize(peak_memory_usage) AS peak_mem,
    read_rows,
    substring(query, 1, 200) AS query_prefix
FROM system.processes
ORDER BY memory_usage DESC
LIMIT 10;
-- Sample read_rows for a specific suspect query
-- Run this twice, 30 to 60 seconds apart, to see progress
SELECT query_id, read_rows
FROM system.processes
WHERE query_id = '<query_id>';
# Check temp directory for spill-to-disk activity
du -sh /var/lib/clickhouse/tmp/
-- Recent query failures by exception code
SELECT
    exception_code,
    count() AS cnt,
    any(exception) AS sample
FROM system.query_log
WHERE type = 'ExceptionWhileProcessing'
  AND event_time > now() - INTERVAL 10 MINUTE
GROUP BY exception_code
ORDER BY cnt DESC;

How to diagnose it

  1. Identify candidates. Query system.processes with elapsed > 60 (or your workload threshold). Sort by elapsed DESC to see the oldest queries, and by memory_usage DESC to see the heaviest consumers.

  2. Check progress. Run SELECT read_rows FROM system.processes WHERE query_id = '...' twice, 30 to 60 seconds apart. If read_rows increases, the query is actively scanning and may be legitimate heavy ETL. If read_rows is static, the query is stalled.

  3. Inspect memory footprint. High memory_usage or peak_memory_usage with flat read_rows suggests a runaway query, such as a Cartesian product or an unbounded hash table.

  4. Look for spill-to-disk. Check /var/lib/clickhouse/tmp/ with du. If it grows while the query runs and CPU is low, the query exceeded memory limits and is writing temporary aggregation or sort data to disk. This happens when max_bytes_before_external_group_by is reached or when the server memory limit is hit.

  5. Examine the query text. Use substring(query, 1, 500) from system.processes. Look for JOINs without equality conditions, missing WHERE clauses on the partition key, or SELECT * across huge tables.

  6. Correlate with failures. Query system.query_log for ExceptionWhileProcessing entries with code 241. A cluster of memory errors around the same time suggests the long query is part of a broader memory pressure event.

  7. Assess server-level impact. Compare MemoryTracking from system.metrics against the server max_server_memory_usage setting. If the server is above 80% and the long query is the top consumer, it is starving other queries.

Metrics and signals to monitor

SignalWhy it mattersWarning sign
system.processes.elapsedIdentifies queries exceeding expected durationQuery running > 10x expected duration or > 30 minutes (workload-dependent)
system.processes.memory_usageReveals resource hoarding per querySingle query using > 50% of max_server_memory_usage
system.processes.read_rowsDistinguishes progress from stagnationread_rows unchanged over 60+ seconds on a supposedly active query
system.events.FailedQuery / FailedSelectQueryReveals systemic query failuresSustained error rate > 1% of total queries over 5 minutes
MemoryTracking vs max_server_memory_usageShows server-level memory pressureMemoryTracking > 80% of limit
Temp directory size (/var/lib/clickhouse/tmp/)Indicates spill-to-disk from memory limitsGrowth correlating with a long-running query
system.query_log.query_duration_ms (P99)Tail latency degradationSustained P99 > 2x baseline for > 15 minutes

Fixes

Kill a runaway or stuck query

Use KILL QUERY WHERE query_id = '<query_id>' on the node where the query is executing. This sends a cancellation signal; the query stops at the next cancellation point. Read-only SELECT queries can usually be killed safely, though the client receives an error.

Warning: Be cautious with INSERT or ALTER operations. Killing an INSERT may leave partial data. Background mutations do not appear in system.processes; use KILL MUTATION WHERE ... in system.mutations instead.

Address spill-to-disk

If the query is slow because it is spilling, choose whether to let it finish or kill it and rewrite. To prevent recurrence, reduce GROUP BY cardinality, add filters, or tune max_bytes_before_external_group_by so spill is controlled rather than catastrophic. Tradeoff: more spill means more disk I/O and slower execution.

Fix the query plan

For full table scans caused by partition pruning failure, rewrite the WHERE clause to include the partition key. For Cartesian JOINs, add explicit ON conditions. See the related guide on full table scans.

Set guardrails with max_execution_time

Apply max_execution_time at the user profile level to cap interactive queries. Legitimate ETL can run under a separate profile with a higher limit or zero. Tradeoff: an overly aggressive limit kills valid long jobs.

Throttle heavy legitimate queries

If the query is valid but resource-intensive, run it under a dedicated user profile with lower max_threads to reduce CPU contention, or schedule it outside peak hours.

Prevention

  • Set per-user max_execution_time and max_memory_usage. Interactive users should have a low ceiling; ETL service accounts can have higher limits. This prevents accidental runaways from ever starting.
  • Review system.query_log weekly. Look for queries with high read_rows or memory_usage that are new or trending upward. Catch pattern changes before they become incidents.
  • Validate partition key filters in application queries. Missing partition keys are the most common cause of unexpected full scans that turn into long-running resource hogs.
  • Monitor system.processes as a leading indicator. An alert on elapsed > 300 catches hogs before they dominate node resources.
  • Size max_bytes_before_external_group_by deliberately. If you rely on spill-to-disk, set it so spills are predictable and do not fill the data disk.

How Netdata helps

  • ClickHouse query latency and memory charts shown alongside host CPU, memory, and disk I/O confirm whether a long query is saturating the node.
  • Alerts on rising MemoryTracking alongside per-query memory spikes detect memory pressure cascades early.
  • Disk latency charts for the data volume confirm spill-to-disk I/O contention.
  • FailedQuery and exception code tracking show when long queries start failing across the cluster.