ClickHouse mutations silently blocking merges: the hidden cause of part growth

You are watching part counts climb on a ClickHouse node. system.merges shows active background tasks, health checks return Ok, and the log shows no mutation errors. Yet inserts are slowing and MaxPartCountForPartition is trending toward the delay threshold. The pool looks busy, so merges should be keeping up. They are not: some of those busy slots are mutations, and mutations starve merges silently.

ClickHouse implements ALTER UPDATE and ALTER DELETE as mutations: asynchronous background jobs that rewrite matching data parts. Mutations and regular merges share the same BackgroundMergesAndMutationsPool. A mutation holds one pool slot for minutes to hours. If the backlog is large enough, merges fall behind insert-driven part creation and the system drifts toward the Too many parts threshold without logging a mutation-specific error.

By default, a mutation starts only when at least number_of_free_entries_in_pool_to_execute_mutation slots in the pool are free. The default is commonly cited as 20 on a 32-slot pool . On a busy server with steady inserts, merges rarely free that many slots, so mutations sit while parts accumulate.

What this means

Every INSERT creates immutable data parts. Background merges consolidate smaller parts to keep query performance stable and avoid file descriptor exhaustion. ALTER UPDATE and ALTER DELETE are not in-place edits; they create mutation tasks that rewrite every matching part. Each mutation consumes one slot in the shared pool, runs to completion or until killed with KILL MUTATION, and survives server restarts.

The default pool size is background_pool_size (16) multiplied by background_merges_mutations_concurrency_ratio (2), giving 32 slots. On an active node, merges keep the pool occupied, mutations cannot start, and the queue grows invisible to operators who only watch part counts.

When a mutation finally runs, it monopolizes its slot. Regular merges are starved, inserts create new parts that are not consolidated, and part count climbs. system.merges shows active tasks, but the work is mutation, not merge.

flowchart TD
    A[ALTER UPDATE/DELETE creates mutation] --> B[Mutation waits for pool slot]
    B --> C[Mutation occupies pool slot]
    C --> D[Regular merges starved]
    D --> E[Parts accumulate from inserts]
    E --> F[DelayedInserts then RejectedInserts]
    C --> G[system.merges shows is_mutation=1]
    G --> H[Operator misses the blockage]

Common causes

CauseWhat it looks likeFirst thing to check
Heavy ALTER UPDATE/DELETE on a large tablesystem.mutations shows is_done = 0 with parts_to_do not decreasingSELECT * FROM system.mutations WHERE is_done = 0
Mutation queue backlogMultiple mutations queued per table; each runs sequentiallySELECT database, table, count() FROM system.mutations WHERE is_done = 0 GROUP BY database, table
Pool slot monopolized by a long mutationsystem.merges shows is_mutation = 1 with high elapsedSELECT is_mutation, elapsed, progress FROM system.merges ORDER BY elapsed DESC
Replicated replica lagging on mutationsOne replica has mutations stuck while others are caught upQuery system.mutations on each replica or use clusterAllReplicas
Insufficient pool headroom for burst workloadBackgroundMergesAndMutationsPoolTask near pool size even without mutationsSELECT metric, value FROM system.metrics WHERE metric LIKE 'Background%Pool%'

Quick checks

-- Count incomplete mutations
SELECT count(*) AS running_mutations FROM system.mutations WHERE is_done = 0;
-- List active mutations with remaining work
SELECT database, table, mutation_id, command, create_time, parts_to_do, latest_fail_reason
FROM system.mutations
WHERE is_done = 0
ORDER BY create_time;
-- Distinguish merges from mutations in the pool
SELECT
  count(*) AS active_tasks,
  countIf(is_mutation = 1) AS mutations,
  countIf(is_mutation = 0) AS regular_merges
FROM system.merges;
-- Check pool utilization
SELECT metric, value FROM system.metrics WHERE metric LIKE 'Background%Pool%';
-- Find worst partition part counts
SELECT database, table, partition_id, count(*) AS parts_in_partition
FROM system.parts
WHERE active = 1
GROUP BY database, table, partition_id
ORDER BY parts_in_partition DESC
LIMIT 10;
-- Check if inserts are already suffering
SELECT event, value FROM system.events WHERE event IN ('DelayedInserts', 'RejectedInserts');
-- Check for failed mutations that will never complete
SELECT database, table, mutation_id, latest_fail_reason
FROM system.mutations
WHERE latest_fail_reason != '';
-- Check mutation status across a cluster
SELECT hostName() AS replica, database, table, mutation_id, parts_to_do
FROM clusterAllReplicas('your_cluster_name', system.mutations)
WHERE is_done = 0
ORDER BY replica, create_time;

How to diagnose it

  1. Confirm part count is growing. Query system.parts for active parts per partition twice, 60 seconds apart. If the worst partition is growing by more than a few parts per minute and regular merges are not completing, the merge subsystem is falling behind.

  2. Inspect what is running in the pool. Query system.merges. Look for is_mutation = 1. If mutations account for most or all active tasks, they are the blockage. Note elapsed and progress. A mutation with progress stuck for minutes is likely the culprit.

  3. Inspect the mutation queue. Query system.mutations where is_done = 0. Check parts_to_do. If it is flat or increasing over a 5-minute window, the mutation is not making progress. Check latest_fail_reason; a non-empty value means the mutation is retrying and will not self-resolve.

  4. Measure pool saturation. Query system.metrics for BackgroundMergesAndMutationsPoolTask. If this value is at or near the pool size, every slot is occupied. When mutations hold those slots, regular merges cannot enter the pool.

  5. Check insert health. Query system.events for DelayedInserts and RejectedInserts. Even small increments of DelayedInserts confirm that part accumulation is already affecting the write pipeline.

  6. Verify cross-replica state for replicated tables. Run the clusterAllReplicas query against system.mutations. If one replica is mutation-blocked while others are healthy, the issue is local to that replica’s background pool or disk.

Metrics and signals to monitor

SignalWhy it mattersWarning sign
system.mutations.parts_to_doTracks remaining work per mutationFlat or increasing over 30 minutes
system.merges.is_mutationDistinguishes mutation tasks from merges in the poolMutations dominate while regular merges drop
BackgroundMergesAndMutationsPoolTaskReveals total pool saturationSustained near pool size for more than 10 minutes
MaxPartCountForPartitionMeasures distance to insert throttlingMore than 50% of parts_to_delay_insert
DelayedInserts / RejectedInsertsConfirms write-path impactAny sustained increase
system.mutations.latest_fail_reasonIdentifies permanently stuck mutationsNon-empty string
Per-replica mutation lagCatches asymmetric stalls in clustersSame mutation is_done on some replicas but not others

Fixes

Kill a non-critical or stuck mutation

Stop a stuck or mistaken mutation with KILL MUTATION:

KILL MUTATION WHERE database = 'db_name' AND table = 'table_name' AND mutation_id = 'mutation_id';

Warning: Killing a mutation is disruptive. Parts already rewritten remain changed; the mutation does not roll back completed work. Subsequent merges and mutations process those parts normally. Use this only when the mutation is safe to discard.

Lower the mutation idle-slot gate

number_of_free_entries_in_pool_to_execute_mutation controls how many slots must be free before a mutation starts. Lowering it lets mutations start with fewer idle slots, which can help a critical mutation finish so merges can resume.

Tradeoff: A lower gate lets mutations start sooner, but mutations still consume pool slots and starve merges while they run. This is a temporary relief valve, not a fix for chronic mutation overload.

Increase background pool capacity

If CPU and I/O headroom exist, raise background_merges_mutations_concurrency_ratio (or background_pool_size on older configurations) to add slots. These settings typically require a server restart or config reload.

Tradeoff: More threads increase disk and CPU contention. Verify that storage latency and CPU utilization are below 70% before raising the pool size, or you may turn merge starvation into I/O saturation.

Reduce mutation frequency

Avoid treating ClickHouse like an OLTP store. Batch changes instead of issuing many small ALTER UPDATE statements. If your version supports lightweight deletes and the use case fits, prefer DELETE FROM over ALTER DELETE; lightweight deletes mark rows with a hidden column rather than rewriting parts immediately, though they still require eventual cleanup merges.

For replicated tables: clear per-replica blockages

Each replica processes mutations independently. If one replica is mutation-blocked while others are healthy, run the diagnostic queries locally on that replica and kill the stuck mutation there.

Do not restart the server to clear a mutation. Mutations are persisted and resume after restart, so a restart only prolongs the stall.

Prevention

  • Monitor mutations as a first-class signal. A dashboard that shows system.parts but not system.mutations will miss the root cause of merge starvation.
  • Alert on mutation stagnation. Alert when parts_to_do is not decreasing over a 15-minute window, and when any mutation has a non-empty latest_fail_reason.
  • Reserve pool headroom. Target background pool utilization below 70% during normal operation. This leaves room for mutations without choking merges.
  • Review schema-change patterns. Flag application code or ETL pipelines that issue frequent ALTER UPDATE/DELETE. ClickHouse mutations are designed for occasional bulk corrections, not continuous row-level updates.
  • Test mutation duration. Before running a mutation on a billion-row table, test it on a representative partition and measure elapsed time and disk I/O.

How Netdata helps

  • Correlate active mutation count from system.mutations with active part count and merge pool utilization. A flat mutation line alongside a climbing part-count line is the classic starvation signature.
  • Alert on the ratio of mutation tasks to total background tasks. When mutations dominate the pool, part growth follows within minutes.
  • Combine DelayedInserts rate with mutation backlog to distinguish mutation blocking from insert overload or poor batching.
  • Surface per-replica mutation states in clustered deployments to catch asymmetric stalls that cluster-wide averages hide.