ClickHouse active part count growing: reading MaxPartCountForPartition before it pages
Rising MaxPartCountForPartition is the leading indicator for the most common ClickHouse production failure: parts accumulating faster than background merges can consolidate them. A single partition crossing 500 active parts means you have hours, not days, before inserts delay and eventually fail with TOO_MANY_PARTS.
The thresholds are per-partition. A table with ten partitions at fifty parts each is healthy; one partition at 950 parts is approaching throttling. Projections create hidden parts inside the same table that count toward the same limits. Materialized views route inserts to separate target tables that can hit their own limits independently.
What this means
Every INSERT creates immutable data parts. Background merges combine smaller parts into larger ones. When inserts outpace merges, merge debt grows exponentially because merge cost increases with part count.
By default, parts_to_delay_insert is 1000 per partition; above this ClickHouse slows inserts. At parts_to_throw_insert (default 3000 in recent releases, 300 in older releases) , inserts are rejected. These defaults are configurable per table, but the mechanics are the same: once merge debt accumulates, recovery becomes harder.
flowchart TD
A[High insert rate or micro-batches] --> B[Active parts per partition rise]
B --> C[Merges fall behind]
C --> D[Part count crosses delay threshold]
D --> E[Insert throttling begins]
E --> F[Part count crosses throw threshold]
F --> G[Inserts rejected with TOO_MANY_PARTS]
C --> H[Query latency degrades
more files to scan]Common causes
| Cause | What it looks like | First thing to check |
|---|---|---|
| Micro-batch inserts | Part count rises steadily; many insert queries but few rows per insert | system.query_log for rows written per insert |
| Mutations blocking merges | Part count grows despite active inserts; few regular merges running | system.mutations for is_done = 0 entries |
| Disk space too low for merge temp space | Parts high, merges absent or stuck; disk near full | system.disks unreserved_space |
| Over-partitioned table | One or a few partitions explode while others are flat; hourly or daily partition keys on high-volume tables | system.parts grouped by partition_id |
| Materialized view amplification | Base table part count normal, but MV target tables grow rapidly | Active parts in target tables of attached materialized views |
| Projection bloat | Base table within limits but hidden projection parts drive total count and latency | Whether projections are defined on the table |
Quick checks
Run these read-only queries in order. None block writes or mutate state.
-- Server-wide worst partition
SELECT value FROM system.asynchronous_metrics WHERE metric = 'MaxPartCountForPartition';
-- Per-partition breakdown
SELECT database, table, partition_id, count(*) AS parts_in_partition
FROM system.parts
WHERE active = 1
GROUP BY database, table, partition_id
ORDER BY parts_in_partition DESC
LIMIT 20;
-- Insert batching pattern from the last hour
SELECT
query,
count() AS inserts,
quantiles(0.5, 0.99)(written_rows)[1] AS p50_rows,
quantiles(0.5, 0.99)(written_rows)[2] AS p99_rows
FROM system.query_log
WHERE type = 'QueryFinish'
AND query_kind = 'Insert'
AND event_time > now() - INTERVAL 1 HOUR
GROUP BY query
ORDER BY inserts DESC
LIMIT 10;
-- Currently running merges and mutations
SELECT database, table, elapsed, progress, num_parts, is_mutation
FROM system.merges
ORDER BY elapsed DESC;
-- Pending mutations that may be consuming merge threads
SELECT database, table, mutation_id, command, parts_to_do
FROM system.mutations
WHERE is_done = 0
ORDER BY create_time;
-- Insert throttling and rejection counters
SELECT event, value
FROM system.events
WHERE event IN ('DelayedInserts', 'RejectedInserts');
-- Disk space available for merge temporary output
SELECT name, path,
formatReadableSize(free_space) AS free,
formatReadableSize(unreserved_space) AS unreserved
FROM system.disks;
-- Background pool saturation
SELECT metric, value
FROM system.metrics
WHERE metric LIKE 'Background%Pool%';
How to diagnose it
- Confirm the scope is per-partition. Run the per-partition query from Quick checks. A table-level aggregate hides hotspots. One partition at 900 parts is an emergency even if the table average is 100.
- Determine if merges are keeping up. Check
system.merges. If merges are running andprogressis advancing, the system is working but may be under-provisioned for the insert rate. If no merges are running despite high part counts, the pool is blocked or starved. - Check for mutation blockage. Run the
system.mutationsquery. Mutations rewrite entire parts and share the background pool with merges. A single long-running mutation on a large table can monopolize threads and silently allow parts to accumulate. - Verify disk headroom for temporary merge output. Merges write the full merged result before deleting source parts. If
unreserved_spaceinsystem.disksis smaller than the partition being merged, merges will stall. There is no fixed multiplier; ensure enough free space for the largest expected merge output. - Inspect insert batching behavior. Query
system.query_logfor insert patterns. Consistently under 1000 rows per insert means micro-batching. This is the most frequent root cause of part accumulation. - Check for hidden amplification. Materialized views write to separate target tables; check their active part counts. Projections create hidden parts within the base table and inflate the same per-partition limits.
- Correlate with event counters. If
DelayedInsertsis increasing, the system is already throttling. Any non-zeroRejectedInsertsmeans inserts are being dropped and the situation is critical.
Metrics and signals to monitor
| Signal | Why it matters | Warning sign |
|---|---|---|
MaxPartCountForPartition | One metric reveals the worst partition across all tables | Sustained value above 500, or positive derivative over 30 minutes |
| Active parts per partition | Limit enforcement and performance degradation are per-partition | Any partition above 50% of parts_to_delay_insert |
Merge activity (system.merges) | Proof that merges are executing and completing | Zero merges running while parts are high and inserts are active |
| Background pool utilization | Whether threads are available to process merges | Sustained above 90% with growing part counts |
DelayedInserts / RejectedInserts | ClickHouse signaling write-path stress | Any increase in RejectedInserts; steady growth in DelayedInserts |
Disk unreserved_space | Merges require temporary space to write output | Approaching less than the largest expected merge output |
| Insert latency P99 | Leading indicator before delay counters increment | Sustained elevation above 2x baseline |
Fixes
Reduce insert rate or increase batch size
If micro-batching is the cause, throttle ingestors or increase client batch sizes to 1000+ rows per insert. If you run ClickHouse 23.x or later, enable async_insert and tune its timeout settings to let the server buffer small inserts into fewer parts. Tradeoff: higher latency before data becomes visible.
Kill mutations that block merges
If system.mutations shows long-running mutations and system.merges is dominated by is_mutation = 1 entries, kill non-critical mutations to free pool capacity:
KILL MUTATION WHERE database = '...' AND table = '...' AND mutation_id = '...';
Tradeoff: the mutation must be reissued later. Do not kill mutations if you rely on their result for data correctness.
Force a merge manually (use with caution)
For a specific hot partition:
OPTIMIZE TABLE db.table PARTITION ID '...' FINAL;
This is CPU and I/O intensive and can contend with inserts on that partition. Run only during low-traffic windows. It is not a substitute for fixing the root cause.
Increase merge concurrency if headroom exists
If CPU and disk I/O are not saturated, you can increase background_merges_mutations_concurrency_ratio. The default is 2. Raising it allows more concurrent merges but increases resource competition with queries. Monitor query latency after the change.
Free disk space immediately
If disk space is the blocker, identify the largest tables:
SELECT database, table,
formatReadableSize(sum(bytes_on_disk)) AS disk_size
FROM system.parts
WHERE active = 1
GROUP BY database, table
ORDER BY disk_size DESC
LIMIT 10;
Detach old partitions to reclaim space quickly. Warning: detached partitions are unavailable for queries until reattached.
ALTER TABLE db.table DETACH PARTITION ID '...';
Do not restart ClickHouse to resolve disk pressure; address the space issue directly.
Prevention
- Alert on the derivative of part count, not just the absolute value. A partition crossing 300 parts is concerning, but a partition growing at 50 parts per hour is an emergency even from a lower base.
- Enforce batch sizes at the client. Target 1000 to 10000 rows per insert. Use
async_insertif you cannot control client behavior. - Keep partition cardinality low. Prefer monthly partitioning (
toYYYYMM) over daily (toYYYYMMDD) for high-volume tables. High-cardinality partition keys multiply parts independently. - Monitor mutations as first-class signals. A forgotten
ALTER UPDATEcan block merges for hours. Watchsystem.mutationsparts_to_dofor stall. - Maintain disk headroom below 80-85% usage. Keep enough unreserved space to write the largest expected merge output; do not rely on a fixed multiplier.
- Account for projections and materialized views. Projections store additional parts inside the same table. Materialized views insert into separate tables that accumulate their own parts. Factor both into part-count budgets.
How Netdata helps
Netdata exposes MaxPartCountForPartition from system.asynchronous_metrics as a gauge, so you see the worst partition without manual queries. Correlate it with background pool task counts, disk I/O latency, and running merge activity on the same timeline to distinguish merge debt from replication lag or query storms. Alert on the derivative of active parts per partition to catch backlog before the delay threshold. For ReplicatedMergeTree, cross-reference with ZooKeeper session health and replication queue depth to separate local merge capacity issues from coordination-layer degradation.
Related guides
- ClickHouse merge death spiral: when parts accumulate faster than merges consolidate
- ClickHouse monitoring checklist: the signals every production cluster needs
- ClickHouse monitoring maturity model: from survival to expert
- ClickHouse DB::Exception: Too many parts - causes and fixes
- How ClickHouse actually works in production: a mental model for operators







